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Abstract 

Information  has  intrinsic  geometric  and  topological  structure,  arising  from  relative 
relationships  beyond  absolute  values  or  types.  For  instance,  the  fact  that  two  people 
did  or  did  not  share  a  meal  describes  a  relationship  independent  of  the  meal’s  ingredients. 
Such  relationships  give  rise  to  lattices.  Lattices  have  topology.  That  topology  informs  the 
ways  in  which  information  may  be  observed,  hidden,  inferred,  and  dissembled.  Privacy 
preservation  may  be  understood  as  finding  isotropic  topologies,  in  which  relationships 
appear  homogeneous.  Moreover,  the  underlying  lattice  structure  of  those  topologies  has  a 
temporal  aspect,  which  reveals  how  isotropy  may  degrade  over  time,  thereby  puncturing 
privacy. 

Dowker’s  Theorem  establishes  a  homotopy  equivalence  between  two  simplicial 
complexes  derived  from  a  relation.  From  a  privacy  perspective,  one  complex  describes 
individuals  with  common  attributes,  the  other  describes  attributes  shared  by  individuals. 
The  homotopy  equivalence  is  an  alignment  of  certain  common  cores  of  those  complexes, 
effectively  interpreting  sets  of  individuals  as  sets  of  attributes,  and  vice-versa.  That 
common  core  has  a  lattice  structure.  An  element  in  the  lattice  consists  of  two  components, 
one  being  a  set  of  individuals,  the  other  being  an  equivalent  set  of  attributes.  The  lattice 
operations  join  and  meet  each  amount  to  set  intersection  in  one  component  and  set  union 
followed  by  a  potentially  privacy-puncturing  inference  in  the  other  component. 

One  objective  of  this  research  has  been  to  understand  the  topology  of  the  Dowker 
complexes,  from  a  privacy  perspective.  First,  privacy  loss  appears  as  simplicial  collapse  of 
free  faces.  The  actual  collapse  is  local,  but  the  property  of  fully  preserving  both  attribute 
and  association  privacy  requires  a  global  condition:  a  particular  kind  of  spherical  hole. 
Second,  by  looking  at  the  link  of  an  individual  in  its  encompassing  Dowker  complex,  one  can 
characterize  that  individual’s  privacy  via  another  sphere  condition.  That  characterization 
generalizes  to  group  privacy.  Third,  even  when  long-term  privacy  is  impossible,  homology 
provides  lower  bounds  on  how  an  individual  may  defer  identification,  when  that  individual 
has  control  over  how  to  reveal  attributes.  Intuitively,  the  idea  is  to  first  reveal  information 
that  could  otherwise  be  inferred.  This  last  result  in  particular  highlights  privacy  as  a 
dynamic  process.  Privacy  loss  may  be  cast  as  gradient  flow.  Harmonic  flow  for  privacy 
preservation  may  be  fertile  ground  for  future  research. 
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Introduction 


1  Introduction 

Privacy  is  the  ability  to  control  how  much  an  individual  or  entity  reveals  about  itself  to  others. 

Fundamental  research  into  privacy  seeks  to  understand  the  limits  of  that  ability. 

A  brief  history  of  privacy  should  include  the  following: 

•  The  right  to  privacy  as  a  legal  principle,  appearing  in  an  1890  Harvard  Law  Review 
article  [20].  The  article  was  a  reaction  to  the  then  modern  technology  of  photography 
and  the  dissemination  of  gossip  via  print  media. 

•  A  demonstration  linking  supposedly  anonymous  information  with  public  data,  thereby 
revealing  sensitive  information  [17].  The  demonstration  employed  birth  date,  gender,  and 
zip  code  to  link  anonymous  public  insurance  information  with  voter  registration  data. 
Doing  so  produced  the  health  record  of  the  governor  of  Massachusetts.  This  privacy 
failure  suggested  a  first  form  of  homogenization,  called  k-anonymity.  Roughly,  the  idea 
was  to  structure  databases  in  such  a  way  that  a  database  could  respond  to  any  query 
with  an  answer  consisting  of  no  fewer  than  k  individuals  matching  the  query  parameters. 

•  The  discovery  that  it  is  impossible  to  preserve  the  privacy  of  an  individual  for  even 
a  single  attribute  in  the  face  of  repeated  statistical  queries  over  a  population  [2] ,  unless 
answers  to  those  queries  are  purposefully  perturbed  with  noise  of  magnitude  on  the  order 
of  at  least  y/ri.  Here  n  is  the  size  of  the  population.  The  significance  of  this  discovery  is 
to  underscore  how  difficult  it  is  to  preserve  privacy  while  retaining  information  utility. 

•  Netflix  Prize.  In  2006,  Netflix  offered  a  $1M  prize  for  an  algorithm  that  would  predict 
viewer  preferences  better  than  Netflix’s  internal  algorithm.  Netflix  made  available  some 
of  its  historical  user  preferences,  in  anonymized  form,  as  a  basis  for  the  competition.  Once 
again,  it  turned  out  that  one  could  link  this  anonymized  data  with  other  publicly  available 
databases,  resulting  in  the  potential  (and  in  some  cases  actual)  identification  of  Netflix 
viewers  and  their  entire  viewing  history  [15].  Whereas  in  the  earlier  health  example,  a 
few  specific  observables  made  linking  possible  (global  coordinates,  one  might  say,  namely 
birth  date,  gender,  zip  code),  in  the  Netflix  example,  the  intrinsic  geometric  structure  of 
the  database  facilitated  linking  via  a  wide  variety  of  observables  (local  landmarks,  one 
might  say,  namely  movies  that  were  characteristic  for  each  individual) .  Key  was  sparsity 
of  information:  8  movie  ratings  and  dates  were  generally  enough  to  uniquely  characterize 
99%  of  viewers  in  the  Netflix  Prize  dataset,  even  with  errors  in  the  ratings  and  dates. 

•  Differential  Privacy  [5,  4]  seeks  to  avoid  the  previous  privacy  failures  by  focusing  on 
local  rather  than  absolute  privacy  guarantees.  The  underlying  approach  in  differential 
privacy  is  for  a  database  to  answer  statistical  queries  with  a  particular  stochastic  blurring. 
Specifically,  the  probability  that  an  interrogator  of  the  database  will  make  any  particular 
inference  should  depend  only  in  a  very  small  way  on  whether  any  one  individual  does  or 
does  not  have  a  particular  attribute  (such  as  even  being  in  the  database).  We  might  call 
this  stochastic  homogeneity. 

•  Randomized  Response.  Differential  privacy  is  further  significant  because  it  makes 
explicit  the  dynamic  nature  of  privacy;  there  may  be  no  enduring  privacy  guarantees  but 
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there  are  differential  guarantees.  A  particular  form  is  randomized  response ,  a  technique 
used  in  the  social  sciences  to  elicit  reliable  aggregate  answers  to  sensitive  questions,  asking 
the  question  of  many  people,  but  perturbing  individual  answers  stochastically  so  as  not 
to  learn  much  about  any  one  individual  [19].  A  version  has  been  employed  by  Google 
to  find  malware  [8].  (We  note  a  form  of  ergodicity:  the  averaging  that  would  destroy 
privacy  for  an  individual  with  repeated  queries  over  time  allows  for  utility  of  information 
at  any  instant  in  time  over  a  large  population.) 

Privacy  has  both  a  combinatorial  component  and  a  statistical  component.  Prior  research 
has  largely  focused  on  statistical  techniques,  both  to  preserve  privacy  and  to  puncture  privacy. 
One  of  the  goals  of  this  research  is  to  understand  the  combinatorial  component  of  privacy, 
leading  naturally  to  methods  from  combinatorial  topology. 

A  desire  to  understand  the  geometry  and  topology  of  the  types  of  inferences  revealed  by 
the  Netflix  Prize  formed  the  specific  motivation  for  our  research  initially.  Subsequently,  we 
realized  that  the  lattice  structure  found  in  that  geometry  had  broader  applicability,  providing 
an  ability  to  model  the  dynamics  of  privacy  more  generally. 
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2  Outline 

The  remaining  sections  and  appendices  present  the  following  material: 

3:  Toy  examples  illustrating  how  a  relation  may  lead  to  privacy  loss  in  the  presence  of 
background  information.  This  section  also  introduces  the  doubly-labeled,  poset  associated 
with  a  relation,  to  model  such  inferences.  The  elements  of  the  poset  are  pairs,  each  a  set 
of  individuals  and  a  set  of  attributes. 

4:  Formal  description  of  the  Galois  Connection  associated  with  a  relation.  The  section 
first  defines,  for  any  relation,  two  simplicial  complexes  called  Dowker  complexes.  One 
complex  represents  sets  of  individuals  with  shared  attributes,  the  other  represents  sets 
of  attributes  shared  by  individuals.  The  Galois  Connection  then  establishes  a  homotopy 
equivalence  between  the  Dowker  complexes,  thereby  generating  the  relation’s  doubly- 
labeled  poset.  The  homotopy  equivalence  gives  rise  to  closure  operators,  with  “closure” 
in  the  poset  modeling  inference  of  unobserved  attributes  from  observed  attributes  (or 
unobserved  individuals  from  observed  individuals).  The  section  defines  attribute  privacy 
and  association  privacy. 

5:  A  characterization  of  privacy  in  terms  of  the  absence  of  free  faces  in  the  relevant  Dowker 
complex.  This  section  observes  as  well  that  the  only  connected  relations  able  to  preserve 
both  attribute  and  association  privacy  must  look  like  either  like  linear  cycles  or  boundary 
complexes.  In  particular,  the  number  of  individuals  and  attributes  must  be  the  same. 

6:  Conditional  relations  as  models  for  simplicial  links.  A  conditional  relation  is  much  like 
a  conditional  probability  distribution.  It  might,  for  instance,  represent  the  possible 
arrangement  of  remaining  attributes  among  individuals,  after  some  attributes  have 
already  been  observed. 

7:  A  characterization  of  individual  and  group  privacy  in  terms  of  spherical  and  boundary 
complexes  for  the  relation  that  models  the  individual’s  or  group’s  link  in  its  Dowker 
complex. 

8:  A  brief  exploration  of  holes  in  relations,  focusing  on  attribute  spaces  generated  by  bits. 

9:  A  small  example  exploring  the  possibility  of  increasing  privacy  by  change-of-coordinate 
transformations. 

10:  A  lengthy  exploration  of  how  someone  can  delay  identification  by  releasing  attributes 
selectively  in  a  particular  order.  This  idea  leads  to  the  notion  of  informative  attribute 
release  sequences ,  how  to  find  such  sequences  in  the  Galois  lattice,  and  the  value  of 
homology  as  a  lower  bound  for  the  number  and  length  of  such  sequences. 

1 1 :  Computation  of  the  homology  and  maximal  informative  attribute  release  sequences  present 
in  two  relations  found  on  the  world  wide  web.  One  relation  describes  Olympic  athletes 
and  their  medals,  the  other  describes  jazz  musicians  and  their  bands. 
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12:  A  more  general  perspective  of  inference  as  motion  in  lattices,  not  necessarily  directly 
derived  from  a  relation.  This  perspective  suggests  connections  to  randomized  response 
techniques. 

13:  An  examination  of  the  ability  to  obfuscate  strategies  and/or  goals  in  graphs  where  motions 
may  be  nondeternrinistic  or  stochastic. 

14:  A  possible  category  for  representing  relations,  along  with  an  analysis  of  morphism 
properties.  The  nrorphisms  between  relations  in  this  category  induce  sinrplicial  and 
therefore  continuous  maps  on  the  Dowker  complexes.  This  section  shows  how  a  surjective 
morphism  at  the  set  level  generates  the  image  lattice  via  lattice  operations  performed  on 
images  of  certain  elements  from  the  domain  lattice. 

A:  A  summary  of  the  basic  notation  and  definitions  used  in  this  report. 

B:  A  summary  of  the  basic  tools  used  in  this  report,  establishing  the  honrotopy  equivalences 
and  closure  operators  mentioned  above. 

C:  Construction  of  links  and  deletions,  and  examination  of  the  privacy  properties  each  inherits 
from  its  encompassing  relation.  This  section  explores  the  significance  of  free  faces  in  the 
Dowker  complexes.  The  section  further  proves  that  a  relation  with  more  attributes  than 
individuals  cannot  preserve  attribute  privacy. 

D:  Proof  that  the  problem  of  finding  a  minimal  set  of  attributes  from  which  another  attribute 
may  be  inferred  is  A'P-complete.  This  stands  in  contrast  to  the  observation  that  the 
problem  of  finding  some  set  of  attributes  from  which  another  may  be  inferred  (or 
reporting  that  no  such  set  exists)  is  computable  in  polynomial  time. 

E:  Detailed  proofs  of  the  results  claimed  in  Section  7.  Also  a  detailed  proof  of  the  assertion 
from  Section  5  regarding  relations  that  preserve  both  attribute  and  association  privacy. 

F:  Detailed  proofs  of  the  connection  between  maximal  chains  in  the  Galois  lattice  and 
informative  attribute  release  sequences.  When  such  sequences  are  order-independent 
they  correspond  to  spherical  holes,  leading  to  the  concept  of  an  isotropic  sequence. 

G:  Detailed  proof  that  homology  establishes  a  lower  bound  for  the  number  and  length  of 
maximal  chains  in  a  relation’s  Galois  lattice,  and  thus  for  the  number  and  length  of 
informative  attribute  release  sequences  that  may  be  used  to  delay  identification. 

H:  An  application  of  the  previous  results  with  the  aim  of  obfuscating  the  identification  of 
strategies  for  attaining  goals  in  graphs  with  uncertain  transitions. 

I:  Detailed  proofs  of  the  assertions  of  Section  14  regarding  nrorphisms. 

J:  Some  additional  examples: 

1.  Dunce  Hat:  modeled  as  a  relation  for  which  the  Dowker  attribute  complex  is 
contractible  but  has  no  free  attribute  faces,  meaning  the  relation  preserves  attribute 
privacy. 
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2.  Disinformation:  An  example  that  glues  together  two  copies  of  the  Mobius  strip, 
thereby  removing  free  faces  and  creating  a  form  of  homogeneity  that  preserves 
attribute  privacy  yet  retains  the  utility  of  identifiability. 

3.  Insufficient  Representation:  If  there  are  insufficiently  many  individuals  in  a  relation 
generated  by  bits,  attribute  inference  is  possible. 

4.  A  Matching  Example:  When  many  individuals  are  being  observed,  cardinality 
constraints  allow  for  inferences  beyond  those  discussed  in  this  report.  One  can 
model  some  such  inferences  using  links  and  joins.  We  have  not  reported  that  work 
here,  merely  provide  one  example. 
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List  of  Primary  Symbols 

Symbol 

Typical  Meaning 

Page(s) 

X 

space  of  individuals 

16, 

85 

Y 

space  of  attributes 

16, 

85 

R 

relation  on  1  x  7 

16, 

85 

Xy 

individuals  with  attribute  y  (usually  in  the  context  of  relation  /?) 

16, 

85 

Yx 

attributes  of  individual  x  (usually  in  the  context  of  relation  R) 

16, 

85 

Q 

another  relation,  often  representing  a  link  in  a  complex 

27,  90, 

43 

s,r 

generic  simplicial  complexes 

81 

complex  generated  by  sets  of  individuals  with  a  common  attribute 

16, 

85 

complex  generated  by  sets  of  attributes  shared  by  some  individual 

16, 

85 

(T 

usually  a  simplex  representing  individuals  in  dtR 

7 

usually  a  simplex  representing  attributes  in 

4>r 

homotopy  equivalence  from  sets  of  individuals  to  shared  attributes 

17, 

85 

i>R 

homotopy  equivalence  from  sets  of  attributes  to  sharing  individuals 

17, 

85 

P 

partially  ordered  set  (poset) 

83 

ff(S) 

face  poset  of  the  simplicial  complex  S 

17, 

83 

A  (P) 

order  complex  of  the  poset  P 

18, 

83 

Pr 

doubly-labeled  poset  associated  with  relation  R 

14,  20, 

86 

L 

(inference)  lattice 

(61) 

84 

p+ 

rR 

Galois  lattice  formed  from  Pr 

38 

{(cTfc,7fc)  <  •  •  •  <  (00,70)} 

chain  of  length  k  in  the  lattice  P^ 

44,  109, 

83 

yi,---,yk 

informative  attribute  release  sequence  (for  a  relation  R) 

41 

V 

set  of  vertices  in  a  simplicial  complex  or  in  a  graph 

d(V) 

simplicial  boundary  complex  with  vertices  V 

24, 

82 

S-1 

sphere  of  dimension  —1,  modeling  the  empty  complex  {0} 

81 

s1 

circle 

24 

Sn"2 

sphere  of  dimension  n— 2 

24, 

82 

ck(  S;Z) 

group  of  simplicial  &:-chains  over  S,  with  integer  coefficients 

81 

d 

(family  of)  reduced  boundary  rnap(s)  Cfe(S;Z)  — >  Cfc_i(X;Z) 

82 

Hk(Y-  Z) 

reduced  /c-dinrensional  homology  group  of  S,  with  integer  coefficients 

82 

G 

nondeterministic  or  stochastic  graph 

65, 

67 

A  G 

strategy  complex  of  a  graph 

66, 

67 

ag 

source  complex  of  a  graph 

119 

homotopy  equivalence 

83 

* 

simplicial  join 

83 

V 

either  topological  wedge  sum  or  lattice  join 

83, 

84 

A 

lattice  meet 

84 
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A  Toy  Example:  Health  Data  and  Attribute  Privacy 


3  Privacy:  Relations  and  Partially  Ordered  Sets 

Our  investigation  of  privacy  in  this  report  will  be  in  terms  of  relations.  As  we  will  see  in  this 
section  and  the  next,  relations  give  rise  to  simplicial  complexes,  which  give  rise  to  partially 
ordered  sets,  which  expose  an  underlying  lattice  structure.  That  lattice  structure  makes  explicit 
how  privacy  may  be  preserved  or  lost  through  so-called  background  knowledge.  As  we  will  see 
in  Section  10,  the  lattice  structure  also  makes  explicit  how  identification  may  be  delayed  by 
careful  release  of  information. 


3.1  A  Toy  Example:  Health  Data  and  Attribute  Privacy 

Consider  the  following  relation  H,  describing  the  results  of  a  health  study  for  four 
patients  and  three  attributes.  The  patients  have  been  anonymized  and  are  represented 
simply  by  the  set  of  numbers  {1,2, 3, 4}.  The  three  attributes  are  drawn  from  the  set 
{smokes,  has.cancer,  drinks_soda}. 

One  can  describe  a  relation  equivalently  either  as  a  matrix  or  as  a  set  of  pairs: 


Relation  H  as  a  matrix: 


H 

SMOKES 

HAS.CANCER 

DRINKSJSODA 

1 

• 

• 

2 

• 

• 

3 

• 

4 

• 

Relation  H  as  a  set  of  pairs: 

{(1, smokes),  (1, has.cancer),  (2, has.cancer),  (2, drinks_soda), 
(3,  drinks_soda)  ,  (4,  drinks_soda)  } . 


Assumptions 

Before  discussing  privacy  further,  we  make  some  assumptions  that  hold  throughout  the  report: 

Assumption  of  Relational  Completeness:  We  generally  assume  that  a  relation  is 

complete,  meaning  it  is  not  missing  any  elements  (a  relation  could  contain  extra  elements, 
which  may  be  useful  as  disinformation).  For  example,  if  we  observe  that  someone  drinks  soda 
and  has  cancer  in  relation  H,  then  we  would  conclude  that  we  are  observing  individual  #2. 
We  would  be  surprised  to  see  that  individual  smoke.  If  for  some  reason  we  ever  do  see  the 
individual  smoke,  then  we  would  deem  our  observations  to  be  inconsistent  with  relation  H.  The 
meaning  of  inconsistency  depends  on  context.  At  top-level  it  may  mean  that  the  relation  or 
observation  is  errorful.  When  making  conditional  observations,  an  inconsistency  may  actually 
supply  useful  information,  as  we  will  see  in  Lemma  11  on  page  29. 

Assumption  of  Observational  Monotonicity:  Even  though  we  assume  relations  are 
complete,  we  do  not  assume  that  observations  are  complete.  Instead  we  assume:  Observing 
that  an  individual  has  a  particular  attribute  is  meaningful;  lack  of  such  an  observation  does 
not  necessarily  imply  that  an  individual  fails  to  have  the  unobserved  attribute.  The  motivation 
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for  this  assumption  is  that  one  may  yet  discover  that  the  individual  has  the  attribute.  For 
example,  suppose  we  observe  someone  (whom  we  know  to  be  part  of  relation  H)  drinking  soda. 
Even  if  that  is  all  we  observe,  we  do  not  conclude  that  the  individual  is  cancer  free.  It  could 
be  that  we  might  yet  observe  the  individual  to  have  cancer. 

If  absence  of  an  attribute  is  significant  and,  that  absence  is  measurable,  then  both  the 
attribute  and  its  negation  could  and  perhaps  should  appear  explicitly  in  the  relation  as  distinct 
mutually  exclusive  attributes.  For  instance,  Prime  versus  Composite  might  be  such  a  pair 
of  attributes  for  integers  greater  than  1. 

Assumption  of  Observational  Accuracy:  We  assume  that  observations  are  accurate.  For 
instance,  if  we  observe  an  integer  to  be  either  Prime  or  Composite,  then  we  do  so  correctly. 

Comments:  The  three  assumptions  above  are  desiderata  for  how  the  mathematical 
abstractions  of  this  report  fit  into  the  real  world.  Some  comments  are  in  order: 

•  In  and  of  itself,  a  relation  defines  a  particular  kind  of  world,  a  bipartite  graph,  and  there 
is  no  need  for  something  like  a  completeness  assumption. 

•  The  nronotonicity  and  accuracy  assumptions  then  describe  a  sensor  for  that  world  and 
how  to  interpret  observations. 

The  purpose  of  the  assumptions  in  the  real  world  is  largely  to  ensure  consistency  between 
different  relations  and  with  observations. 

•  The  nronotonicity  assumption  is  important  because  information  generally  aggregates 
asynchronously.  Together  with  the  other  assumptions,  this  assumption  means  that  one 
may  view  relations  as  monotone  Boolean  functions,  and  thus  may  leverage  methods  from 
combinatorial  topology. 

•  One  may  incorporate  errors  into  the  relational  and  observational  models,  for  instance  by 
blurring  a  relation.  For  very  large  integers,  a  relation  might  allow  some  integers  to  have 
both  Prime  and  Composite  as  attributes.  Although  an  integer  is  one  or  the  other,  the 
relation  admits  to  uncertainty  by  allowing  both  attributes  at  once.  Indeed,  some  relations 
purposefully  introduce  such  blurring  to  preserve  privacy.  And,  in  robotics,  relational 
blurring  in  a  sensor-compatible  fashion  can  be  a  useful  technique  for  establishing  the 
topology  of  a  region,  for  instance  when  dualizing  sensors  and  landmarks  [10]. 

Privacy  Implications 

If  the  health  study  H  is  publicly  available,  then  it  has  the  following  privacy  implications: 

•  Suppose  someone  named  Bob  tells  his  friend  Alice  that  he  was  part  of  the  study.  Alice 
knows  that  Bob  smokes  everywhere  he  goes,  so  she  can  infer  that  he  is  Patient  #1  and 
has  cancer.  (This  is  an  example  of  inference  in  a  relation  using  background  knowledge.) 

•  Suppose  Cindy  is  Patient  #2.  She  has  full  privacy  as  far  as  relation  H  is  concerned.  In 
particular,  as  we  saw  already,  Cindy  can  tell  her  friends  that  she  was  part  of  the  health 
study  while  drinking  soda  and  those  friends  will  not  be  able  to  conclude  that  she  has 
cancer. 
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A  Toy  Example:  Health  Data  and  Attribute  Privacy 


•  Patients  #3  and  #4  are  not  only  indistinguishable  from  each  other  but  also  from  Cindy 
(patient  #2).  This  is  a  very  strong  form  of  anonymity.  Even  if  one  of  them  reveals  that 
s/he  drinks  soda,  s/he  will  remain  indistinguishable  from  the  other  two  patients  who 
drink  soda. 

Caveat:  In  the  last  case,  if  Cindy  reveals  that  she  has  cancer  and  is  seen  to  be  different 
from  the  other  individuals,  then  one  may  be  able  to  remove  her  from  the  relation,  narrowing 
the  focus  and  creating  a  new  relation  that  may  allow  additional  inferences.  Similar  caveats 
hold  for  the  other  bullets.  Deletions  are  discussed  further  in  Appendix  C. 


Modifying  a  Relation  to  Increase  Privacy  We  can  make  a  small  change  in  relation  H 
that  enhances  privacy.  If  we  artificially  give  patient  #3  the  attribute  SMOKES,  then  we  obtain 
the  following  modified  relation  H'\ 


H’ 

SMOKES 

HAS.CANCER 

DRINKS_SODA 

1 

• 

• 

2 

• 

• 

3 

• 

• 

4 

• 

Now  Bob  may  reveal  to  Alice  that  he  was  part  of  the  health  study  without  Alice  being  able 
to  infer  that  he  has  cancer,  even  though  she  knows  that  everyone  knows  that  he  smokes.  In 
fact,  more  generally,  one  can  no  longer  infer  cancer  from  smoking. 

Such  an  artificial  entry1  in  the  relation  is  a  form  of  disinformation.  It  certainly  skews 
statistics  and  utility.  It  also  increases  privacy. 


terminology:  We  often  use  the  term  'entry'  to  mean  an  element  of  a  relation,  as  in  a  matrix. 
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3.2  A  Dual  Perspective:  Payroll  Data  and  Association  Privacy 

The  previous  example  examined  a  relation  from  the  perspective  of  attribute  privacy:  we  were 
interested  in  understanding  how  observation  of  some  attribute(s)  implies  other  attribute(s).  A 
dual  perspective  is  association  privacy,  in  which  one  seeks  to  understand  how  some  associations 
between  individuals  imply  others. 

The  following  “salary”  relation  S  has  the  same  matrix  structure  as  H  did  earlier,  but  with 
different  semantics.  This  relation  represents  employees  {Bob,  Mary,  Frank,  Julie}  working  on 
secret  projects  {a,  b,  c}.  Now  the  employee  names  are  visible  so  that  a  payroll  clerk  can  disburse 
salaries  correctly,  but  the  actual  projects  are  anonymous. 

S  a  b  c 

Bob  •  • 

Mary  •  • 

Frank  • 

Julie  • 

The  salary  relation  S  facilitates  the  following  implications  regarding  individuals: 

•  If  someone  tells  the  payroll  clerk  that  Julie  is  the  lead  of  a  very  important  project,  then 
the  payroll  clerk  can  infer  that  Mary  and  Frank  may  have  valuable  information. 

•  In  contrast,  if  someone  tells  the  payroll  clerk  that  Bob  is  the  lead  of  a  very  important 
project,  the  payroll  clerk  cannot  be  sure  that  Mary  is  also  working  on  that  project. 

Regarding  disinformation:  Observe  how  adding  the  artificial  entry  (Julie,  a)  prevents  the 
payroll  clerk  from  inferring  that  Mary  and  Frank  have  valuable  information,  even  if  the  payroll 
clerk  knows  that  Julie  does: 

S'  a  b  c 
Bob  •  • 

Mary  •  • 

Frank  • 

Julie  •  • 
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Privacy  Preservation  and  Loss:  A  Poset  Model 


3.3  Privacy  Preservation  and  Loss:  A  Poset  Model 


R 

T 

2 

3 

4 


a  b 


Pr 


({1,2},  {b})  ({2,3,4} 


({l}.{a,b}) 


({2},  {b,c}) 


Figure  1:  Relation  R  serves  as  a  model  for  the  two  examples  of  Sections  3.1  and  3.2.  The 
doubly-labeled  poset  Pr  describes  the  inferences  facilitated  by  R. 

Figure  1  shows  a  relation  R  that  serves  as  a  model  for  both  the  health  example  of  Section  3.1 
and  the  payroll  example  of  Section  3.2.  The  relation  is  identical  to  those  given  earlier,  but  with 
abstract  labels  in  place  of  both  individuals  and  attributes.  The  figure  also  depicts  a  partially 
ordered  set  (poset)  Pr,  designed  to  model  the  inferences  discussed  previously.  We  refer  to 
that  poset  as  the  doubly-labeled,  poset  associated  with  R.  We  next  discuss  the  semantics  of  Pr. 
Section  4  discusses  the  construction  of  Pr.  The  underlying  concepts  are  important  throughout 
the  report. 

Semantics  of  the  poset  Pr: 

•  Each  element  in  the  poset  consists  of  a  pair  (a,  7),  with  0  /  a  C  {1,2,  3, 4}  describing  a 
set  of  individuals  and  0  /  7  C  {a,  b,  c}  describing  a  set  of  attributes.  We  say  that  the 
poset  element  is  labeled  with  a  and  7.  The  meaning  of  such  a  double-labeling  is: 

(a)  All  individuals  in  a  have  all  attributes  in  7. 

(b)  If  an  individual  has  at  least  all  the  attributes  in  7,  then  that  individual  must  be 
in  a.  For  example,  we  see  that  individual  #2  and  only  individual  #2  has  both 
attributes  b  and  c  in  R. 

(c)  If  an  attribute  is  shared  by  at  least  all  individuals  in  a,  then  that  attribute  must  be 
in  7.  For  example,  attribute  b  and  only  attribute  b  is  shared  by  both  individuals 
#1  and  #2. 

•  The  partial  order  for  Pr  is  described  by  the  edges  in  the  figure.  There  is  an  edge 
between  two  elements  (07,71)  and  ((72,72)  of  Pr  whenever  the  corresponding  sets  are 
subset  comparable.  In  particular,  (0-1,71)  <  (0-2,72)  in  Pr  precisely  when  07  C  0-2  and 
71  2  72-  [Observe  that  the  comparability  (C  versus  D)  is  opposite  for  a  versus  7.] 

Using  the  poset  Pr  for  attribute  inference: 

Suppose  7  is  any  nonempty  subset  of  attributes  in  {a,  b,  c}.  Then: 

(i)  Either:  no  individual  has  all  the  attributes  7.  For  example,  no  individual  has  both 
attributes  {a,  c}.  We  would  not  expect  to  see  7  and  so  7  does  not  appear  in  the  poset. 
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(ii)  Or:  7  is  a  subset  of  at  least  one  set  of  attributes  that  does  appear  in  the  poset.  In  this 
case,  one  may  be  able  to  enlarge  7  nontrivially,  resulting  in  privacy  loss. 

For  example,  imagine  that  individual  #1  (Bob  in  our  first  example  above)  tells  us  that 
he  has  attribute  a  (smokes).  So  7  =  {a}.  The  poset  then  allows  us  to  infer  that  Bob 
must  also  have  attribute  b  (has_CANCEr).  Why?  Because  {a,  b}  is  a  minimal  set  in  Pr 
containing  {a}. 

We  can  say  yet  more:  The  element  labeled  with  {a,  b}  is  also  labeled  with  {1}.  So  now 
we  know  that  Bob  is  individual  #1. 

Regardless  of  whether  Bob  ever  actually  talks  to  us,  the  poset  tells  us  that  individual  #1 
could  suffer  privacy  loss,  and  in  fact,  is  uniquely  identifiable  without  needing  to  reveal 
everything  about  himself. 

Similar  reasoning  is  possible  for  association  inference,  as  we  saw  earlier. 


R' 

a  b  c 

1 

•  • 

2 

•  • 

CO 

•  • 

4 

• 

Pr 

({1,3},  {a})  ({1,2},  {b})  ({2,3,4},  {c}) 


({1},  {a,b})  ({2},  {b,c})  ({3},  {a,c}) 


Figure  2:  A  relation  R!  along  with  its  doubly-labeled  poset  Pri.  The  relation  preserves 
attribute  privacy  but  allows  a  small  amount  of  association  inference:  If  ones  sees  individual 
#4  in  some  context  c,  then  one  can  infer  that  individuals  R2  and  #3  are  also  present  in  that 
same  context,  without  needing  to  observe  them  directly. 


Disinformation  Revisited:  Figure  2  shows  relation  R' ,  constructed  from  11  by  adding  an 
entry  of  disinformation,  much  as  we  constructed  H'  from  H  earlier.  The  figure  also  shows  the 
doubly-labeled  poset  Pr/.  Observe  that  it  is  no  longer  possible  to  infer  {a,  b}  from  {a}  because 
{a}  now  appears  directly  in  the  poset.  The  added  entry  has  increased  attribute  privacy. 

There  is,  however,  still  some  opportunity  for  making  association  inferences.  For  instance, 
knowing  that  individual  #4  (Julie  earlier)  works  on  an  important  secret  project  still  allows  the 
inference  that  individuals  #2  and  #3  might  have  valuable  information.  That  is  because  the 
minimal  set  containing  {4}  in  the  poset  is  {2,3,4}.  Notice  that  no  such  association  inference 
is  possible  if  someone  says  that  individual  #3  works  on  an  important  secret  project,  though 
that  would  have  been  possible  in  the  original  relation  R. 
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Dowker  Complexes 


4  The  Galois  Connection  for  Modeling  Privacy 

Section  3  showed  by  example  how  a  relation  determines  a  partially  ordered  set  (poset)  useful 
for  modeling  privacy.  The  elements  in  the  poset  are  pairs  —  a  set  of  attributes  and  a  set  of 
individuals  —  that  are  equivalent  from  the  relation’s  perspective.  Privacy  loss  occurs  when  an 
observer  has  data  (for  example,  background  knowledge)  that  is  not  directly  in  the  poset  but  is 
a  proper  subset  of  some  set  of  attributes  or  individuals  in  the  poset.  The  observer  may  then 
infer  some  additional  attributes  or  individuals.  This  section  develops  the  connection  between 
relations  and  posets  more  precisely,  continuing  to  use  the  earlier  examples  for  illustration.  See 
also  Appendix  B  for  additional  material  and  notation. 

4.1  Dowker  Complexes 

Definition  1  (Dowker  Complexes).  Let  X  and  Y  be  finite  discrete  spaces  and  let  R  be  a 
relation  on  X  xY.  This  means  R  is  a  set  of  pairs  ( x ,  y ),  with  x  G  X  and  y  G  T .  We  frequently 
view  R  as  a  matrix  of  Os  and  Is,  or  blank  and  nonblank  entries,  with  X  indexing  rows  and  Y 
indexing  columns. 

(a)  We  often  refer  to  elements  of  X  as  individuals  and  to  elements  of  Y  as  attributes. 

(b)  For  each  x  G  X,  let  Yx  =  {y  G  Y  \  (x,y)  G  R}.  Then  Yx  consists  of  all  attributes  of 

individual  x.  We  may  view  Yx  as  a  row  of  R.  The  row  is  blank  ifYx  =  0. 

(c)  For  each  y  G  Y,  let  Xy  =  {i£l  |  (x,  y)  G  R}.  Then  Xy  consists  of  all  individuals  who 
have  attribute  y.  We  may  view  Xy  as  a  column  of  R.  The  column  is  blank  if  Xy  =  0. 

(d)  We  next  define  two  simplicial  complexes  Hr  and  Hr: 

$ r  =  {7  C  Y  |  there  exists  x  €  X  such  that  (x,  y)  G  R  for  all  y  €  7}, 

Hr  =  {cr  C  X  |  there  exists  y  G  T  such  that  (x,y)  G  R  for  all  x  G  a}. 

Special  cases:  If  X  and  Y  are  both  nonempty,  then  the  empty  simplex  0  is  in  both  Hr 
and  Hr.  Otherwise,  with  some  exceptions  discussed  later  (Section  6,  Section  10,  and 
Appendix  C),  we  take  both  complexes  to  be  void. 

We  refer  to  Hr  and  Hr  as  Dowker  complexes  after  the  author  of  upcoming  Theorem  2. 

Interpretation:  A  nonempty  set  7  of  attributes  is  a  simplex  in  Hr  precisely  when  at  least 
one  individual  has  at  least  all  the  attributes  in  7.  We  refer  to  any  such  individual  as  a 
witness  for  7. 

Similarly,  a  nonempty  set  a  of  individuals  is  a  simplex  in  H r  precisely  when  there  is  at 
least  one  attribute  that  is  shared  by  at  least  all  the  individuals  in  a.  We  refer  to  any  such 
attribute  as  a  witness  for  a. 

Figure  3  shows  the  Dowker  complexes  for  the  relation  R  of  Section  3.3. 

Dowker ’s  Theorem  [3]  says  that  the  two  simplicial  complexes  Hr  and  Hr  have  the  same 
homotopy  type.  As  we  will  see,  the  maps  establishing  that  homotopy  equivalence  define  the 
poset  Pr  and  describe  how  privacy  may  be  lost. 
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Figure  3:  Simplicial  complexes  and  \k r  associated  with  relation  R. 


Theorem  2  (Dowker  [3]).  Suppose  R  is  a  relation  on  X  x  Y.  Let  and  ^>r  be  as  in 
Definition  1.  Then  and  are  homotopy  equivalent. 

Every  nonvoid  simplicial  complex  E  determines  a  partially  ordered  set  5(E)  called  the  face 
poset  of  E.  The  elements  of  this  poset  are  the  nonempty  simplices  of  E,  partially  ordered  by 
set  inclusion.  (Recall  that  'poset'  is  short  for  'partially  ordered  set'.) 

For  the  finite  setting,  the  homotopy  equivalence  of  Dowker’s  Theorem  may  be  seen  by 
explicit  formulas  for  maps  between  the  face  posets  of  the  two  Dowker  complexes.  These  maps 
describe  what  is  known  as  a  Galois  Connection.  [This  construction  also  appears  as  a  core  tool 
within  the  field  of  Formal  Concept  Analysis  [21,  9].]  Here  are  the  formulas: 

<t>R  ■  5(^h)  ->•  5($r)  ipR  ■  5($r)  ->  St’f'iO 

a  l->  fl  Yx  7  ^  n  Xy 

x£cr  ye  7 

These  two  maps  are  inverse  homotopy  equivalences.  One  sees  this  by  considering  the  maps 
4>r  °  tpR  and  ipR  °  f’R-  These  compositions  turn  out  to  be  what  are  called  closure  operators  on 
the  face  posets  and  fifty r),  respectively,  implying  that  each  is  homotopic  to  the  identity 

map,  thereby  establishing  the  desired  homotopy  equivalence.  See  Appendix  B  for  detailed 
computations;  see  the  next  subsection  for  interpretation. 

4.2  Inference  from  Closure  Operators 

A  poset  map  /  :  P  — >  P  is  said  to  be  a  closure  operator  whenever  x  <  f(x)  and  f(f(x))  =  f(x) 
for  all  x  G  P.  If  /  is  a  closure  operator,  then  it  induces  a  homotopy  equivalence  between  P 
and  the  image  f(P)  (see  [1,  18]). 

One  can  think  of  a  closure  operator  as  “pushing  elements  up”  in  the  poset.  From  a  privacy 
perspective,  “pushing  up”  amounts  to  inference.  Specifically,  (4>r  o  7)  \  7  consists  of  all 
additional  attributes  that  may  be  inferred  from  observing  attributes  7,  while  (i/jr  o  cbji)  (<r)  \  a 
consists  of  all  additional  individuals  that  may  be  inferred  from  observing  individuals  a. 

Comment:  The  formulas  for  (J>r  and  i/jr  in  Section  4.1  and  the  inference  perspective  extend 
to  the  empty  simplex.  Observe  that  i/jr(®)  =  X ,  so  r  o  i/jr)(®)  consists  of  all  attributes  that 
every  individual  in  X  has.  If  {fi)R  0  VirX®)  X  then  the  attributes  {fin  o  V’j?)(0)  are  inferable 
“for  free”  from  R ,  that  is,  without  making  any  observations.  Similarly,  (i/jr  o  (j>R,)(fi)  consists 
of  all  individuals  who  have  every  attribute  in  Y . 
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Inference  from  Closure  Operators 


Any  poset  P  defines  a  simplicial  complex  A (P)  called  the  order  complex  of  P.  The 
simplices  of  A (P)  are  given  by  the  finite  chains  {po  <  pi  <  ■  ■  •  <  pn}  in  P.  Suppose 
we  start  with  a  simplicial  complex  E,  construct  its  face  poset  $(£),  and  then  construct  the 
order  complex  A($(E)).  The  result  is  isomorphic  to  the  first  bary centric  subdivision  of  E.  A 
convenient  visualization  of  the  face  posets  3{<1>^)  and  J(\I' n)  therefore  is  to  draw  the  barycentric 
subdivisions  of  &n  and  \k r,  respectively,  as  in  Figure  4. 


Figure  4:  Order  complexes  of  the  face  posets  of  the  complexes  and  shown  in  Figure  3. 

Viewed  in  the  order  complexes,  functions  fin  and  fin  are  easy  to  visualize.  They  are  fully 
determined  by  their  action  on  vertices  of  the  order  complexes,  as  shown  in  Table  1.  (Bear 
in  mind  that  each  element  of  $(&r)  represents  a  simplex  in  but  is  a  vertex  in  A(5(4>i?)). 
Similarly,  each  element  of  St'I'/j)  represents  a  simplex  in  'hjj  but  is  a  vertex  in  A(5r(^rj{))-) 


a 

<Ar(c) 

(fiR°fiR,){' 

{1} 

{a,  b} 

{1} 

7 

iMt) 

{2} 

{b,c} 

{2} 

{a} 

{1} 

{a,b} 

{3} 

{4 

{2,3,4} 

{b} 

{1,2} 

{b} 

{4} 

{c} 

{2,3,4} 

{c} 

{2,3,4} 

{c} 

{1,2} 

{b} 

{1,2} 

{a,  b} 

{1} 

{a,b} 

{2,3} 

{4 

{2,3,4} 

{b,c} 

{2} 

{b,c} 

{3,4} 

{4 

{2,3,4} 

{2,4} 

{c} 

{2,3,4} 

{2,3,4} 

{c} 

{2,3,4} 

Table  1:  The  maps  fin  and  fin,  and  their  compositions,  for  relation  R  of  Figure  3. 


Using  Table  1  one  can  again  see  how  privacy  loss  might  occur  via  R. 

For  instance,  the  map  fin  o  fin  gives  rise  to  the  closure  (i.e. ,  a  “pushing  up”) 

{a}  {1}  {a,  b}, 

telling  us  how  to  infer  unobserved  attribute  b  from  observed  attribute  a  (in  the  health  study 
example  of  Section  3.1,  Alice  could  infer  that  Bob  HAS-CANCER  from  knowing  that  he  smokes). 
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Similarly,  for  the  map  i/jr  o  4>r, 

{4}^U{c}^{2,3,4}, 

leading  to  association  inference  (in  the  payroll  example  from  Section  3.2,  the  payroll  clerk 
could  infer  Bob  and  Mary’s  exposure  to  valuable  information  after  learning  of  Julie’s  work  on 
an  important  project). 

Figure  5  indicates  the  homotopy  deformations  produced  by  the  maps  cf>R  o  i/jr  and  tpR  o  cj>R, 
while  Figure  6  show  the  resulting  image  of  each  face  poset. 


Figure  5:  Closure  operators  4>R  °  V’fl  and  VfR  °  produce  homotopy  deformations,  indicated 
by  directed  edges.  In  $(&r),  {a}  closes  up  to  {a,  b}.  In  5(41 _r),  most  of  the  subsets  of  {2,3,4} 
close  up  to  {2,3,4}.  The  exception  is  subset  {2},  which  does  not  move. 


imgC^  o  yR) 


{a,b}  {b}  {b,c}  {c} 


img(\|/„  o  (j)  ): 


R  '  R J 

{1}  {1,2}  {2}  {2,3,4} 

Figure  6:  Result  of  the  closure  operators  of  Figure  5. 

Observe  that  these  two  images  are  isomorphic.  Matching  up  corresponding  elements 
produces  the  poset  Pr  of  Figure  1. 


Summary:  A  relation  R  produces  two  simplicial  complexes,  &r  and  4/#,  one  modeling 
attributes  shared  by  individuals,  the  other  modeling  individuals  with  common  attributes.  The 
complexes  are  related  by  two  maps,  4>r  and  ipR,  that  are  homotopy  inverses.  The  compositions 
of  these  maps  describe  the  attribute  and  association  inferences  possible  via  R,  leveraging 
background  information  someone  may  have.  These  inferences  are  summarized  by  a  poset  Pr 
that  pairs  sets  of  individuals  with  sets  of  attributes.  We  may  describe  Pr  as  follows: 
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Disinformation  Example  Re-Revisited 


Definition  3  (Doubly-Labeled  Poset).  Let  R  be  a  relation  on  X  xY. 

The  doubly-labeled  poset  Pr  consists  of  all  pairs  of  sets  (<r,  7)  such  that  0  /  o  6  'L r, 
0  /  7  G  4>r,  o  =  /R{ 7),  and  7  =  (fR(a). 

The  partial  order  on  PR  is  defined  by:  (07,71)  <  (07,72)  if  and  only  if  o  1  C  07 
(and/or,  equivalently,  71  D  72 )■ 

(This  definition  agrees  with  our  intuition  that  PR  is  both  the  image  ( ipRo  f>R)($(A> R))  and 
the  image  {/r  °  iPr){${$r)),  by  Appendix  B.) 

4.3  Attribute  and  Association  Privacy 

Here  are  formal  definitions  for  the  intuition  developed  via  the  previous  examples: 

Definition  4  (Attribute  Privacy).  A  relation  R  preserves  attribute  privacy 

precisely  when  <fR  o  / R  is  the  identity  operator  on  the  poset  ^(^r)  U  {0}. 

Definition  5  (Association  Privacy).  A  relation  R  preserves  association  privacy 
precisely  when  / R  o  cf>R  is  the  identity  operator  on  the  poset  B^r)  U  {0}. 

Comment:  For  notational  simplicity,  we  frequently  say  simply  that 

4>r  0  /r  is  the  identity  on  and/or  that  /R  o  f>R  is  the  identity  on  'Lr. 


4.4  Disinformation  Example  Re-Revisited 

Recall  the  relation  R'  of  Figure  2,  which  is  relation  R  of  Figure  1  but  with  an  added  entry 
of  disinformation.  Figure  7  displays  the  resulting  Dowker  complexes  and  the  actions  of  the 
closure  operators.  Figure  8  flattens  out  the  poset  PR>  of  Figure  2,  so  one  sees  its  triangle 
structure  and  how  it  is  the  image  of  the  Dowker  complexes  under  the  closure  operators  for  R! . 
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Figure  7:  The  Dowker  complexes  as  well  as  the  order  complexes  of  their  face  posets  for  the 
relation  R'  of  Figure  2.  The  closure  operator  <Pri  o  ijjRi  is  the  identity  on  StT-R/)  U  {0}-  The 
closure  operator  ijjRi  o  (j>Ri  on  U  {0}  closes  many  (but  not  all)  subfaces  of  {2, 3, 4}  up  to 

{2,3,4},  as  indicated  by  the  directed  arrows.  The  result  is  a  poset  isomorphic  to  the  poset  Pr> 
of  Figure  2,  drawn  again  slightly  differently  in  Figure  8.  Thus  relation  R'  preserves  attribute 
privacy  but  not  association  privacy. 


Figure  8:  A  flattened  view  of  the  doubly-labeled  poset  Pri  from  Figure  2.  Combined  with 
Figure  7,  this  perspective  shows  how  Pri  arises  as  the  images  of  5(<h^/)  and  di^R1)  under  the 
closure  operators  4>ri  o  i^ri  and  i/jr*  o  <fiR/,  respectively.  (The  vertices  drawn  as  bigger  dots  in 
the  current  figure  were  higher  up  in  the  poset  of  Figure  2  than  those  drawn  as  smaller  dots.) 
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Free  Faces 


5  The  Face  Shape  of  Privacy 


R 

T 

2 

3 

4 


a 


b 


Figure  9:  Relations  R  and  R'  of  Section  3,  along  with  their  attribute  complexes  &r  and  <h/j/. 


5.1  Free  Faces 

Figure  9  recapitulates  relation  R  and  R'  from  the  previous  two  sections,  along  with  their 
Dowker  attribute  complexes,  and  &r',  respectively.  Recall  that  in  R  one  could  make  the 
inference  a  =>■  b,  but  no  such  inference  was  possible  in  R' . 

Support  for  the  inference  a  =>■  b  in  R  is  evident  in  No  such  support  is  evident  in  <b^/. 
In  particular,  observe  how  vertex  a  has  only  one  incident  edge  in  but  has  two  incident  edges 
in  The  fact  that  there  are  two  edges  in  4^/,  with  those  edges  being  maximal  simplices, 
means,  intuitively,  that  vertex  a  is  being  “pulled”  in  two  different  inference  directions,  so  one 
cannot  conclude  anything  additional  from  a.  In  contrast,  in  a  is  being  “pulled”  only 
toward  b,  so  it  is  possible  that  a  implies  b. 

The  underlying  geometry  is  that  of  a  free  face.  A  simplex  cr  of  a  simplicial  complex  £  is 
said  to  be  a  free  face  of  £  if  it  is  a  proper  subset  of  exactly  one  maximal  simplex  of  £.  That 
is  true  for  {a}  in  but  not  for  {a}  in  4>^/. 

Of  course,  vertex  {c}  also  forms  a  free  face  in  4>/j,  yet  one  cannot  make  any  inferences  upon 
observing  just  c.  So,  what  is  going  on?  The  difference  is  that  c  is  itself  the  only  attribute  of 
some  individual  in  R.  Even  though  {c}  is  technically  a  free  face  of  <f>R,  it  is  not  really  free  to 
move  under  the  closure  operator  (pn  o  whereas  {a}  is.  Observe  that  individuals  #2,  #3, 
and  #4  all  have  attribute  c,  but  only  individual  #2  has  additional  attributes.  This  means 
that  individuals  ff 3  and  #4  cannot  ever  be  identified;  they  have  effectively  “camouflaged” 
themselves  with  individual  #2. 

If  one  disallows  or  disregards  such  camouflage,  then  the  idea  of  a  free  face  and  privacy  loss 
are  equivalent.  The  following  definition  is  useful: 
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Definition  6  (Unique  Identifiability) .  Let  R  be  a  relation  on  X  xY  and  suppose  x  6  X . 

We  say  that  x  is  uniquely  identifiable  via  relation  R  when  ip r{Yx )  =  {x}. 

Suppose  R  is  a  relation.  Appendix  C  proves  that  if  has  no  free  faces,  then  R  preserves 
attribute  privacy.  For  the  converse,  Appendix  C  further  proves  that  if  R  preserves  attribute 
privacy  and  if  every  individual  is  uniquely  identifiable,  then  has  no  free  faces.  (Dual 
statements  hold  for  association  privacy.) 

5.2  Privacy  versus  Identifiability 

Section  5.1  hinted  at  the  difference  between  privacy  and  identifiability.  In  relation  I  below 
( “I”  for  “individuality”  or  “identity” ) ,  every  individual  has  exactly  one  attribute  that  uniquely 
identifies  that  individual.  Relation  I  preserves  privacy  fully.  It  is  impossible  to  make  any 
attribute  inferences.  If  Bob  reveals  that  he  has  attribute  yBobi  then  Alice  cannot  infer  any 
additional  attributes  for  Bob.  He  has  himself  revealed  everything  about  himself  that  there  is 
to  know,  as  far  as  relation  /  is  concerned. 


I 

yi  V2  ■■ 

•  Vn 

X\ 

• 

X2 

• 

Xn 

• 

In  contrast,  all  individuals  in  relation  C  (for  “conformism”)  have  exactly  the  same  set  of 
attributes.  As  a  result,  there  is  no  privacy:  one  can  predict  all  the  attributes  of  any  individual  in 
the  relation  without  making  any  observations.  On  the  other  hand,  no  individual  is  identifiable. 


c 

yi 

V2 

•  •  •  Vn 

Xl 

• 

• 

•  •  •  • 

X2 

• 

• 

•  •  •  • 

Xn 

• 

• 

•  •  •  • 

Homogeneity:  Relation  C  exhibits  a  form  of  homogeneity  often  sought  by  anonymization 
or  other  privacy  techniques.  As  we  have  suggested  before,  the  utility  of  relation  C  is  essentially 
zero,  unless  one  makes  the  entries  stochastic,  so  that  some  utility  is  encoded  in  the  distribution. 

The  discussion  of  free  faces  in  Section  5.1  suggests  an  alternative  approach  to  homogeneity: 
one  may  preserve  privacy  and  retain  utility  by  choosing  the  geometry  of  the  relation 
appropriately,  for  instance,  so  the  space  exhibits  sphere-like  homogeneity.  There  will  be 
considerable  discussion  of  the  importance  of  spheres  in  the  rest  of  the  report. 

5.3  Spheres  and  Privacy 

The  attribute  complex  of  Figure  9  is  equal  to  a  boundary  complex,  namely  the  boundary 
of  the  full  simplex  consisting  of  the  attributes  {a,  b,  c}.  We  will  denote  boundary  complexes 
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A  Spherical  Non-Boundary  Relation  that  Preserves  Attribute  Privacy 


by  d(V),  with  V  some  nonempty  set.  The  simplices  of  d(V)  are  all  proper  subsets  of  V . 
Boundary  complexes  are  homotopic  to  spheres,  specifically  9(F)  —  §n_2,  with  n  =  |F|.  For 
of  Figure  9,  we  have  that  =  9({a,b,c})  ~  S1.  (In  English:  The  Dowker  attribute 
complex  is  the  boundary  of  a  triangle,  so  homotopic  to  a  circle.) 

More  generally,  if  for  some  relation  R,  =  d(Y),  then  cannot  have  any  free  faces  and 
so  R  preserves  attribute  privacy. 

Privacy  and  Utility:  An  important  observation  is  that  boundary  complexes  exhibit 
homogeneity  but  still  permit  identifiability.  If  &R  =  d(Y)  and  no  individual’s  attributes  are  a 
subset  of  another’s  attributes,  then  one  can  and  needs  to  specify  |P|  —  1  attributes  in  order  to 
identify  an  individual.  The  boundary  structure  ensures  that  one  cannot  infer  any  attributes 
by  specifying  fewer  than  |Yj  —  1  attributes,  yet  retains  the  ability  to  identify  every  individual. 

Appendix  J.l  gives  an  example  of  a  contractible  space  that  preserves  attribute  privacy. 
Observe,  however,  that  the  number  of  attributes  needed  to  identify  an  individual  in  that 
example  is  considerably  less  than  the  total  number  of  attributes  in  the  space.  For  a  boundary 
complex,  it  is  just  one  less. 


Preserving  Association  and  Attribute  Privacy:  A  consequence  of  these  observations  is 
that  if  one  wishes  to  preserve  both  attribute  and  association  privacy,  then  one  requires  both 
Dowker  complexes  to  look  like  spheres.  More  specifically,  either  both  Dowker  complexes  are 
linear  cycles  or  both  look  like  boundary  complexes  of  the  same  dimension.  In  the  latter  case, 
the  relation  is  isomorphic  to  a  relation  of  the  following  form,  in  which  the  diagonal  {(xj,  ?/.;)} 
is  blank  but  all  other  entries  are  present: 
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See  Appendix  E  for  further  details. 

5.4  A  Spherical  Non-Boundary  Relation  that  Preserves  Attribute  Privacy 

Consider  relation  R  as  in  Figure  10.  Relation  R  preserves  attribute  privacy,  since  &R  has  no 
free  faces.  The  relation  does  not  preserve  association  privacy.  In  particular,  the  quadrilaterals 
drawn  for  in  the  figure  are  actually  tetrahedra.  This  means  that  the  diagonals  of  the 
quadrilaterals  are  free  faces.  For  instance,  one  would  expect  to  infer  individuals  #1  and  ^6  as 
additional  unobserved  associates  if  one  observes  individuals  #3  and  #4.  Indeed,  computing 
using  the  closure  operator  ipR  o  cf>Rl  we  see  that: 

(VtR°<?h?.)({3,4})  =  VtR(M)  =  {1,3, 4, 6}. 
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Figure  10:  A  relation  R  and  its  Dowker  complexes  and  \F/j,  each  homotopic  to  the  two- 
dimensional  sphere  §2.  (One  may  view  as  two  party  hats  glued  together.  One  may  view 
as  a  triangular  cylinder  with  endcaps.  However,  the  quadrilaterals  drawn  for  the  cylinder 
portion  of  \Fij  are  simply  flattened  sketches  of  what  are  actually  solid  tetrahedra.) 


Relation  R  has  another  interesting  feature.  Even  though  is  not  itself  a  boundary 
complex,  it  is  the  simplicial  join  of  two  boundary  complexes: 

$R  =  <9({a,b,c})  *<9({d,e}). 

In  fact,  we  can  think  of  R  as  R\  U  R2,  with  R±  the  restriction  of  R  to  the  attributes 
{a,  b,  c}  and  R2  the  restriction  of  R  to  the  attributes  {d,e}.  This  means  that  we  can  view 
every  individual  in  R  as  being  described  by  two  independent  attribute  spaces.  The  attribute 
space  {d,  e}  acts  like  a  standard  bit;  every  individual  has  exactly  one  of  these  two  attributes. 
In  contrast,  the  attribute  space  {a,  b,  c}  is  an  “any  2  of  3”  type  of  descriptor.  Every  individual 
has  exactly  two  of  these  three  attributes. 

Figure  11  shows  the  relations  R±  and  R2  along  with  their  Dowker  attribute  complexes. 
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Figure  11:  Relation  R  of  Figure  10  decomposes  into  two  disjoint  relations  R\  and  R2  such 
that  <f>/j  =  <Fr1  *  <F/j2,  with  the  boundary  of  a  triangle  and  <F/j2  two  isolated  points.  This 
means  every  individual  in  R  has  attributes  that  act  like  two  independent  coordinates:  an  “any 
2  of  3”  component  and  a  bit. 
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6  Conditional  Relations  as  Simplicial  Links 

The  decomposition  of  Figures  10  and  11  is  reminiscent  of  stochastic  independence  expressed  as 
multiplication  of  probabilities.  Similarly,  there  is  a  combinatorial  analogue  to  the  notion  of  a 
conditional  probability  distribution.  It  appears  as  the  link  of  a  simplex  in  a  simplicial  complex. 

Given  a  relation  R,  suppose  we  have  observed  attributes  7  for  some  unknown  individual. 
The  remaining  possible  combinations  of  attributes  we  might  yet  observe  are  described  by 
the  simplicial  complex  Lk(<h^,7)  =  |  r  n  7  =  0  and  r  U  7  E  <£/?}.  Interpretation: 

t  n  7  =  0  means  that  r  consists  of  as  yet  unobserved  attributes,  while  r  U  7  £  means 
that  there  is  some  individual  who  has  the  attributes  r  in  addition  to  the  attributes  7  that  have 
already  been  observed. 
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Figure  12:  Relation  Q  describes  the  conditional  relation  resulting  from  R  of  Figure  10  upon 
observing  attribute  d.  Note  that  <I>q  =  Lk(<h#,  {d}). 


For  instance,  after  observing  attribute  d  in  relation  R  of  Figure  10,  we  may  conclude  that 
we  are  observing  one  of  the  individuals  in  {1,2,3}  and  that  the  remaining  attributes  we  might 
yet  observe  are  any  two  attributes  drawn  from  {a,  b,  c}.  We  can  express  these  conclusions  as 
yet  another  relation,  namely  the  relation  Q  of  Figure  12.  Relation  Q  describes  exactly  which 
individuals  could  give  rise  to  which  attributes,  consistent  with  the  observation  of  d  already 
made.  Thus  plays  a  role  much  like  a  probability  distribution,  while  plays 
the  role  of  a  conditional  distribution.  For  another  example,  suppose  we  have  observed 
attribute  b  in  R.  Then  the  resulting  conditional  relation  Q'  is  as  in  Figure  13. 


Figure  13:  Relation  Q'  describes  the  conditional  relation  resulting  from  R  of  Figure  10  upon 
observing  attribute  b.  Here  <&qi  =  Lk(<h#,  {b}).  The  attribute  space  for  Q'  now  factors  into 
two  independent  bits:  {a,  c}  constitutes  one  bit,  {d,  e}  the  other.  This  factoring  is  conditional 
on  having  observed  b. 
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The  formal  constructions  of  conditional  relations  proceed  as  follows  (a  symbol  of  the  form 
R\w  means  “restrict  R  to  IT”).  See  also  Appendix  C. 

Definition  7  (Conditional  Attribute  Relations).  Let  R  be  a  relation  on  X  x  Y  and  suppose 
7  C  Y.  The  following  relation  Q  models  Lk(<h#,7): 

Q  =  R\axYi  with  a  =  ^R{  l)  and  Y= 

xGcr 

The  Dowker  complexes  are  defined  in  the  standard  way,  except  for  this  special  case: 

If  Y  =  0  and  o  0,  we  let  <£<3  and  ikg  be  instances  of  the  empty  complex  {0}. 

Observe:  Lk(<h/j,7)  =  <hg  (a  proof  appears  in  Appendix  C). 

Comment:  If  7  0  then  a  =  0  and  Q  is  void,  and  so  <hg  is  void,  consistent  with  the 

standard  definition  of  Lk(d>R,  7)  being  void  in  this  situation. 


There  is  a  dual  construction  for  links  of  individuals  <7  in  the  Dowker  complex  modeling 
associations: 

Definition  8  (Conditional  Association  Relations).  Let  R  be  a  relation  on  X  xY  and  suppose 
o  Cl  The  following  relation  Q  models  Lk(\k/j,  a): 

Q  =  R\xx-y ,  with  7  =  <Pr(o)  and  x  =  (J  xy  \  <*■ 

ye  t 

The  Dowker  complexes  are  defined  in  the  standard  way,  except  for  this  special  case: 

If  X  =  0  and  7  0,  we  let  'I 'q  and  $<3  be  instances  of  the  empty  complex  {0}. 

Observe:  Lk(\k/j,  a)  =  'kg. 

As  we  will  see  in  Section  7,  the  complex  Lk(\kR,  {x})  is  useful  for  characterizing  individual 
x’s  attribute  privacy.  If  that  seems  surprising,  observe  that  Lk('k/j,  {x})  models  the  connections 
x  has  to  other  individuals.  Those  connections  determine  whether  in  <hg,  and  thus  back  in 
there  are  attributes  of  x  that  are  “free  to  move”  under  the  closure  operators. 
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7  Privacy  Characterization  via  Boundary  Complexes 
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Figure  14:  With  R  as  in  Figure  10,  relation  Q  describes  the  conditional  relation  corresponding 
to  Lk(4'R,  {3}).  Also  shown  are  the  Dowker  complexes  of  Q.  By  design,  \kQ  =  Lk(\k^,{3}). 
Observe  that  4>q  is  the  boundary  complex  <9({b,  c,  d}),  with  {b,  c,  d}  being  all  of  individual  #3’s 
attributes  in  relation  R.  That  boundary  condition  characterizes  full  privacy  for  an  individual. 


We  observed  earlier  that  every  individual  in  relation  R  of  Figure  10  has  full  attribute 
privacy.  We  came  to  that  conclusion  after  observing  that  has  no  free  faces.  In  fact,  one 
can  focus  in  on  the  privacy  of  a  single  individual  rather  than  look  at  the  full  relation.  Let’s 
pick  one  such  individual,  say  #3,  and  look  at  the  conditional  relation  Q  that  models  the  link 
Lk(\k/j,  {3}),  as  shown  in  Figurel4. 

Individual  #3  has  attributes  {b,  c,  d}  in  R.  The  attribute  complex  Tq  for  Q  is  the  boundary 
complex  on  exactly  this  set.  Interpretation:  for  any  nonempty  proper  subset  of  individual  #3’s 
attributes,  some  combination  of  other  individuals  in  R  has  at  least  those  attributes,  but  not 
all  of  individual  #3’s  attributes.  Moreover,  there  is  a  different  combination  of  individuals  for 
each  proper  subset  that  is  missing  exactly  one  of  #3’s  attributes.  That  diversity  of  individuals 
ensures  #3’s  attribute  privacy. 

The  previous  example  suggests  the  following  characterization:  An  individual  has  full 
attribute  privacy  precisely  when  the  attribute  complex  of  the  individual’s  link  is  the  boundary 
complex  of  the  individual’s  attributes.  Observe  that  this  characterization  is  local  to  the 
individual;  it  does  not  depend  on  other  individuals  having  privacy.  We  now  formalize  this 
intuition.  Proofs  appear  in  Appendix  E. 

Recall  Definitions  4  and  6,  from  pages  20  and  22,  respectively,  formalizing  the  notions  of 
privacy  preservation  and  unique  identifiability.  And  recall  the  semantics  of  Pr,  for  instance 
from  Definition  3  on  page  20. 

Theorem  9  (Individual  Attribute  Privacy).  Let  R  be  a  relation  on  X  x  Y,  with  |X|  >  1. 
Suppose  x  G  X  is  uniquely  identifiable  via  R.  Let  Q  be  the  relation  modeling  Lk('L^,a:). 

Then  the  following  three  conditions  are  equivalent: 

(a)  R  preserves  attribute  privacy  for  x, 

(b)  Lk(\k^,x)  ~  E>k~2,  withk=\Yx\, 

(c)  =  d(Yx). 
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The  previous  theorem  generalizes  to  sets  of  individuals  for  sets  that  are  “stable”  under  the 
closure  operators,  i.e.,  that  appear  as  the  “set  of  individuals  component”  in  an  element  of  Pr: 

Theorem  10  (Group  Attribute  Privacy).  Let  R  be  a  relation  on  X  x  Y . 

Suppose  (<7,7)  £  Pr,  with  o  y  X.  Let  Q  be  the  relation  modeling  cr). 

Then  the  following  three  conditions  are  equivalent: 

(a)  (0RO^R)(y)  =  7' ,  for  every  subset  7'  of  7, 

(b)  Lk(’I'ft,  cr)  ~  8>k~2,  with  k  =  |7|, 

(c)  $Q  =  9(7). 


The  following  lemma  appears  in  Appendix  E: 

Lemma  11  (Interpreting  Local  Operators).  Let  R  be  a  relation  on  X  x  Y . 

Suppose  (a,  7)  £  Pr,  with  tr/I. 

Let  Q  be  the  relation  on  X  x  7  that  models  Lk^i^cr)  and  suppose  X  y  0. 

Then,  for  every  7'  C  7;  (i)  If  7'  y  then  -4>r( 7')  =  a, 

(ii)  If  i  £  Tq,  then  ifR( 7')  D  cr. 

Moreover,  in  this  case: 

If  {f>Q  0  V’q)(0)  =  0,  (^  o  V’i?)(0)  =  0- 

if  i  +  0,  ^en  (f)Q  o  yQ)(y)  =  (^>R  0 

The  lemma  says  that  observations  of  attributes  that  are  consistent  in  Q  have  as 
interpretation  more  individuals  in  R  than  just  the  individuals  a,  but  if  ever  those  observations 
become  inconsistent  in  Q,  then  one  has  identified  a  in  R.  Here  “inconsistent  in  Q”  means  that 
the  observed  attributes  are  legitimate  attributes  for  Q  but  do  not  constitute  a  simplex  of  ^<3. 
(Note:  Such  observed  attributes  necessarily  constitute  a  simplex  of  since  they  are  a  subset 
of  7  £  $R). 

Moreover,  attribute  inferences  are  identical  in  R  and  Q  for  nonempty  simplices  of  <hQ. 
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8  The  Meaning  of  Holes  in  Relations 

We  have  seen  how  spheres  characterize  privacy.  More  generally,  when  working  with  topological 
spaces,  holes  are  significant.  One  wonders  what  holes  mean  for  relations. 

•  Some  holes  arise  as  a  consequence  of  exclusion  between  attributes,  as  we  saw  in  the 
decomposition  of  Figures  10  and  11. 

Sticking  with  binary  exclusions,  suppose  a  group  of  individuals  are  described  by  k  bits. 
One  can  model  those  individuals  via  a  relation  containing  2k  attributes  (one  attribute 
for  each  possible  bit  value).  Every  individual  has  exactly  k  of  those  2k  attributes.  If  all 
possible  2k  combinations  of  bit  values  are  represented  by  individuals  in  the  relation,  then 
the  two  Dowker  complexes  are  both  homotopic  to  §fc_1,  the  sphere  of  dimension  k— 1.  In 
fact,  is  the  simplicial  join  of  k  copies  of  §°,  while  is  visualizable  as  a  hypercube 
in  k  dimensions,  with  (k—  l)-dimensional  subcubes  fattened  to  be  simplices.  Figures  15, 
16,  and  17  depict  the  cases  k  =  1,2,  and  3,  respectively. 

In  short,  k  bits  means  a  hole  of  dimension  k—  1,  if  all  possible  individuals  are  actually 
present  in  the  relation. 

(The  lack  of  an  expected  hole  may  mean  that  the  capacity  of  a  relation  has  not  been 
exhausted,  hinting  at  possible  inference.  See  Appendix  J.3.) 
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Figure  15:  Relation  S  describes  two  individuals  in  terms  of  a  single  attribute  and  its  negation. 
The  topology  of  the  Dowker  complexes  is  §°. 


Figure  16:  Relation  Q  describes  four  individuals  in  terms  of  two  attributes  and  their  negations. 
The  topology  of  the  Dowker  complexes  is  S1. 


•  Minimal  nonfaces  (which  may  or  may  not  be  topological  holes)  suggest  restrictions  of  a 
relation  to  equal-numbered  attributes  and  individuals  for  whom  there  is  both  attribute 
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Figure  17:  Relation  R  describes  eight  individuals  in  terms  of  three  attributes  and  their 
negations.  The  topology  of  the  Dowker  complexes  is  82.  The  cube  faces  are  actually  tetrahedra, 
flattened  to  parallelograms  in  the  drawing. 


and  association  privacy,  within  the  restricted  relation.  This  observation  follows  from  the 
following  results  (here  we  assume  that  each  relation  has  no  blank  rows  or  columns): 

—  A  relation  with  more  attributes  than  individuals  cannot  fully  preserve  attribute 
privacy. 

—  A  relation  with  more  individuals  than  attributes  cannot  fully  preserve  association 
privacy. 

—  A  relation  that  preserves  both  attribute  and  association  privacy  must  have  the  same 
number  of  attributes  and  individuals.  Moreover,  if  the  relation  is  connected,  then 
both  Dowker  complexes  are  either  linear  cycles  of  the  same  length  or  they  are  both 
boundary  complexes  of  full  simplices,  as  we  indicated  previously. 

See  Appendices  C  and  E  for  further  details  and  proofs. 

Consequently,  minimal  nonfaces  of  a  relation  may  be  viewed  by  restriction  as  descriptions 
of  subrelations  that  preserve  both  attribute  and  association  privacy. 

•  Minimal  nonfaces  can  have  other  context-dependent  meanings.  For  instance,  in  a  certain 
authorship  relation,  knowing  that  each  pair  of  three  individuals  has  written  a  paper 
together  appears  to  be  a  good  predictor  that  all  three  individuals  will  co-author  a  paper 
together  [13].  This  suggests  the  following:  if  one  sees  that  such  an  authorship  hole 
does  not  fill  over  time,  then  one  likely  can  infer  some  kind  of  obstruction,  perhaps  an 
incompatibility  in  the  group  as  a  whole,  or  the  death  of  an  author,  for  instance. 

•  When  designing  relations  or  anonymizing  relations,  these  results  suggest  transformations 
that  create  “bubbly  spaces”  of  some  sort,  in  order  to  retain  identifiability  but  also  reduce 
unwanted  inference.  Sections  9  and  J.2  discusses  examples. 

•  Whatever  holes  there  are  in  <k/j  and  'k/j  must  also  show  up  in  the  poset  Pr,  since  that 
poset  is  formed  by  homotopy  equivalences  from  &r  and  'k/j.  Interestingly,  whereas  one 
thinks  of  <k#  and  *k/j  simply  as  spaces,  one  sees  a  partial  order  on  Pr.  Something  can 
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move,  “up”  or  “down”.  The  elements  of  Pr  are  inference-stable,  by  design.  So,  what  is 
this  possible  motion?  It  is  a  dynamic  process  that  describes  how  information  changes 
interpretation.  For  instance,  as  an  individual  reveals  information  about  him-  or  herself, 
an  observer  can  attempt  to  identify  the  individual,  by  finding  interpretations  in  Pr  of 
the  information  revealed.  As  the  individual  reveals  additional  information,  the  observer’s 
interpretation  moves  downward  in  Pr.  narrowing  the  set  of  individuals. 

Holes  in  the  spaces  &r  and  'F.r  (and  thus  Pr)  constrain  how  that  interpretation  moves 
downward  in  Pr.  The  greater  a  hole’s  dimension,  the  further  a  downward  path  has  to 
move  before  identifying  an  individual.  One  can  think  of  holes  in  a  relation  much  like 
boulders  in  a  stream.  Eventually,  the  current  of  information  sweeps  past  the  hole,  but 
it  is  forced  to  divert  its  motion,  covering  more  distance.  Moreover,  there  may  be  many 
paths  around  the  hole,  much  like  a  leaf  in  a  stream  may  divert  around  a  boulder  in 
different  directions.  The  individual  can  force  a  particular  path  by  choosing  to  reveal 
attributes  in  a  particular  order. 

Much  of  the  rest  of  the  report  explores  the  implications  of  this  stream  analogy.  The 
analogy  merges  with  the  realization  that  privacy  is  a  dynamic  process,  certain  to  flow 
toward  identification  when  attributes  are  static  or  persistent,  yet  subject  to  channeling 
and  turbulence  when  fluid.  See  in  particular  Section  10  onward. 
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9  Change-of- Attribute  Transformations 

Free  faces  and  holes  in  the  Dowker  complex  can  sometimes  suggest  changes  in  attributes 
that  preserve  desired  information  but  reduce  inference.  Consider  the  “ice-cream  cone”  relation 
C  of  Figure  18  and  the  corresponding  complexes  shown  in  Figure  19.  The  relation  describes 
four  individuals  in  terms  of  the  two-flavor  two-scoop  ice-cream  cones  each  individual  enjoys  at 
a  particular  ice-cream  parlor. 
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Figure  18:  Four  individuals  and  their  preferences  for  ice-cream  cones  containing  two  scoops, 
with  different  flavors  (each  letter  represents  a  flavor,  as  indicated).  See  Figure  19  for  the 
Dowker  complexes. 


gs  gc 


Bob 


Figure  19:  The  Dowker  complexes  for  the  relation  of  Figure  18.  c  is  a  complex  whose  vertices 
are  ice-cream  cones  (two  flavors).  (For  visualization  purposes,  the  complex  is  flattened,  with 
the  leftmost  and  rightmost  vertices  really  representing  the  same  ice-cream  cone.)  Each  maximal 
simplex  is  a  triangle,  labeled  with  the  individual  who  enjoys  the  three  types  of  cones  comprising 
the  triangle.  H'c'  is  a  complex  whose  vertices  are  individuals.  Each  maximal  simplex  is  an  edge, 
representing  a  two- flavor  two-scoop  ice-cream  cone  that  each  of  two  individuals  enjoys;  the  edge 
is  labeled  with  the  cone  flavors.  The  homotopy  type  of  each  complex  is  S1  V  S1  V  S1. 


Relation  C  is  a  typical  “2-implies-3”  relation:  Any  two  different  ice-cream  cones  fully 
identify  an  individual,  thereby  implying  a  third  ice-cream  cone,  as  can  be  seen  from  either 
Dowker  complex:  In  <f>c,  every  edge  is  a  free  face  of  its  encompassing  triangle.  Moreover,  the 
edge  is  not  itself  generated  by  any  individual.2  The  closure  operator  <pc  o  must  therefore 
map  every  edge  to  a  triangle.  Similarly,  in  any  two  edges  intersecting  at  a  vertex  imply 

the  third  edge  incident  on  that  vertex. 

2  We  say  an  individual  a;  of  a  relation  R  generates  a  simplex  7  £  when  7  =  YX.  Similarly,  an  attribute  y 
generates  a  simplex  a  £  ' 1>r  when  a  =  Xy. 
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This  type  of  relation  models,  in  the  small,  inferences  such  as  those  reported  in  [17,  15]. 
For  instance,  [17]  reported  that  zip  code,  gender,  and  birth  date  were  likely  sufficient  in  1990 
to  identify  87%  of  individuals  in  the  U.S.  That  is  nearly  a  “3-implies-all”  type  of  relation. 
Similarly,  [15]  reported  that  8  movie  ratings  and  dates  were  enough  to  uniquely  identify  99% 
of  viewers  in  the  Netflix  Prize  dataset.  That  is  essentially  an  “8-implies-all”  type  of  relation. 

Let’s  focus  for  a  moment  on  Bob’s  neighborhood.  That  relation,  let’s  call  it  B,  and  its 
complexes  are  depicted  in  Figure  20.  (The  relation  models  St(’Pc)  {Bob}),  see  Appendix  C.) 


Figure  20:  Relation  B  models  Bob’s  neighborhood  in  the  ice-cream  relation  of  Figure  18. 
Each  maximal  simplex  is  labeled  with  its  generator.  Generators  of  non-maximal  simplices  are 
indicated  in  parentheses. 

As  in  C,  seeing  someone  eat  one  ice-cream  cone  is  not  enough  to  identify  anyone  in  B. 
Seeing  someone  (in  this  case  Bob),  eat  two  different  types  of  ice-cream  cones,  is  sufficient  to 
infer  the  third  type  of  ice-cream  cone  that  individual  prefers.  How  might  we  prevent  this?  We 
observe  that  the  vertices  of  T#  are  themselves  generated  by  individuals  while  the  edges  are 
not.  Homotopically,  therefore,  we  want  to  expand  the  vertices  of  into  edges,  and  contract 
the  edges  of  Tb  into  vertices.  One  possible  way  to  accomplish  this  is  the  take  logical  ORs  of 
the  existing  attributes.  With  ©  meaning  Boolean  OR,  we  define: 

a  =  gc  ©  gs,  P  =  gc  ©  cs,  7  =  gs  ©  cs. 

Then  relation  B  becomes  B'  as  in  Figure  21.  The  result  is  that  the  free  faces  of  <&b'  now 
are  generated  by  other  individuals,  so  even  though  they  are  free,  the  closure  operator  does  not 
move  them.  In  fact,  the  closure  operator  pB'  °  Pb'  is  the  identity  on  $(&b')  U  {0},  meaning 
that  no  attribute  inference  is  possible  in  Bl . 


B ' 

a  P  7 

Bob 

•  •  • 

Alice 

•  • 

David 

•  • 

Cindy 

•  • 

Figure  21:  Relation  B'  represents  relation  B  of  Figure  20,  now  with  a  coordinate  transformation 
for  the  attributes.  Simplices  are  again  labeled  by  generators. 
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Now  imagine  performing  similar  operations  for  all  four  individuals  of  relation  C  from 
Figure  18.  One  winds  up  constructing  four  logical  ORs: 

gc  ©  gs  0  gv,  gc  0  cs  0  cv,  gs  ©  cs  0  sv,  cv  0  SV  0  gv. 

Two  observations: 

1.  Each  OR  describes  three  ice-cream  cones  that  form  a  hole  in  the  complex  of  Fig.  19. 

2.  Each  such  hole  may  be  interpreted  as  a  single  flavor,  namely  the  flavor  in  common  to  the 
three  ice-cream  cones  appearing  in  the  OR.  For  instance,  “ginger”  (abbreviated  as  g)  is 
the  common  flavor  for  the  OR  gc  0  gs  0  gv. 

In  order  to  describe  the  resulting  relation,  it  is  perhaps  easiest  to  express  those  four  new 
coordinates  themselves  via  a  relation  S  that  describes  the  scoops  present  in  an  ice-cream  cone: 


s 

g  C  S  V 

gc 

•  • 

gs 

•  • 

cs 

•  • 

cv 

•  • 

sv 

•  • 

gv 

•  • 

Finally,  to  perform  the  coordinate-transformation,  one  simply  multiplies  Boolean  matrices, 
with  addition  being  Boolean  OR  and  multiplication  being  Boolean  and:  F  =  CS.  The  relation 
F  and  its  complexes  appear  in  Figure  22. 


Figure  22:  Relation  F  describes  the  individual  flavors  each  individual  prefers.  is  the 
boundary  of  a  tetrahedron,  with  flavors  as  vertices.  is  the  Dowker  dual  of  meaning  it 
too  is  the  boundary  of  a  tetrahedron,  now  with  the  roles  of  flavors  and  individuals  interchanged. 
For  both  and  ^fp,  each  maximal  simplex  is  a  triangle,  labeled  with  its  generator. 


Relation  F  represents  a  description  of  the  four  individuals’  preferences  in  terms  of  flavors 
not  cones.  The  resulting  complexes  and  'hi?  are  now  boundary  complexes  of  full  simplices, 
each  homeomorphic  to  82.  These  complexes  have  no  free  faces,  so  no  inference  is  possible. 
Observe  further  that  is  homotopic  to  what  one  obtains  from  by  filling  the  §1-holes. 
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Indeed,  this  idea  implicitly  motivated  our  construction,  as  a  way  to  remove  free  faces.  Similarly, 
'ffp  is  isomorphic  to  what  one  obtains  from  'I 'c  by  filling  its  S^holes. 

One  should  ask  how  this  approach  might  generalize.  The  answer  is  mixed.  The  idea 
of  removing  free  faces  is  central.  There  are  many  ways  to  accomplish  that,  with  relational 
composition  being  but  one  method.  One  issue  with  logical  ORs  is  that  it  is  very  easy  to  obtain 
an  OR  that  is  always  True,  at  which  point  the  resulting  attribute  is  of  little  use. 

Even  with  more  general  transformations,  there  remains  the  issue  of  whether  the  new 
attributes  are  grounded  in  what  is  actually  measurable.  In  the  ice-cream  example,  it  was 
fortunate  that  cones  decomposed  naturally  into  flavors.  It  is  at  least  plausible  that  someone 
might  merely  observe  the  flavors  a  customer  prefers,  not  the  combinations  of  flavors  as  cones. 
If,  however,  only  cones  can  be  observed,  then  one  is  forced  to  deal  with  relation  C  as  given. 
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10  Leveraging  Lattices  for  Privacy  Preservation 

This  section  examines  more  carefully  the  poset  Pr  along  with  its  lattice  structure,  leading  to 
the  idea  of  informative  attribute  release  sequences.  These  are  attributes  that  an  individual 
can  release  in  a  particular  order  so  as  to  prevent  inference  of  any  attributes  yet  to  be  released 
via  the  sequence.  The  lattice  length  therefore  describes  the  extent  to  which  an  individual  can 
defer  identification.  Homology  in  Pr  provides  lower  bounds  on  that  length. 


10.1  Attribute  Release  Order 


Relation  G  of  Figure  23  describes  hypothetical  co- authorships  among  five  authors  in  producing 
travel  guides  for  five  European  cities.  Each  collaboration  consists  of  three  authors  working 
together  on  one  of  the  five  travel  guides. 


1 

2 

3 

4 

5 


G 

Athens 

Berlin 

Caen 

(Alice) 

•  • 

(Ben) 

•  •  • 

(Claire) 

•  • 

(David) 

• 

(Eric) 

• 

g 

m  x 

P  3 
Q  H 


G 


B 

2 


Figure  23:  A  relation  G  describing  co-authorship  of  travel  guides.  The  Dowker  complexes  are 
dual  triangulations  of  the  Mobius  strip,  with  S1  homotopy  type.  (Notes:  Integers  indicate 
authors,  letters  indicate  cities  via  first  letter  abbreviations.  Some  vertices  and  edges  appear 
twice  for  ease  of  viewing.  Each  maximal  simplex  is  labeled  with  its  generating  author  or  city.) 


Suppose  in  casual  conversation  a  person  mentions  that  he/she  worked  on  producing  a  travel 
guide  for  Berlin.  In  the  context  of  relation  G,  that  information  means  the  author  is  one  of 
{Alice,  Ben,  Claire}.  If  the  author  further  mentions  working  on  the  travel  guide  for  DUBLIN, 
then  that  identifies  the  author  as  Claire.  Equivalently,  the  listener  can  infer  that  the  author 
also  helped  write  the  travel  guide  for  Caen.  (This  form  of  inference  was  the  source  of  problems 
for  the  Netflix  Prize  [15].) 

Claire  was  a  co-author  on  three  travel  guides,  for  Berlin,  Caen,  and  Dublin.  Now 
consider  the  different  possible  sequential  ways  in  which  Claire  might  reveal  which  books  she 
helped  co-author,  along  with  the  point  at  which  her  identity  becomes  known  (see  Figure  24). 

Of  the  six  possible  ways,  four  do  not  fully  identify  Claire  until  she  has  revealed  all  three 
books  that  she  co-authored.  However,  two  of  the  possible  six  release  sequences  do  allow  a 
listener  to  identify  the  author  and  infer  an  additional  book  that  she  co-authored. 

This  example  shows  how  inference  may  be  a  dynamic  process.  While  a  consumer  of  data 
may  wish  to  identify  Claire  with  as  little  information  as  possible,  the  author  herself  may  wish 
to  delay  that  identification  for  as  long  as  possible  (perhaps  for  reasons  of  public  mystery  in 
selling  books).  In  the  example,  the  minimal  length  of  an  identifying  attribute  release  sequence 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


38 


Inferences  on  a  Lattice 


-  Berlin  Caen  Dublin  Caen  Berlin  Dublin 

Ol  .  . 

■2 

o  _ | _  _ | _ 

&  Caen  Berlin  Caen  Dublin  CDublin^  CBerlin^ 

%  I  I  1  I 

y  (^Dublin)  C^Dubli^)  (^Berlin)  ([Berlin)  Caen  Caen 

N  z’' 

inferable  travel  guide 

Figure  24:  This  figure  shows  the  six  possible  sequential  ways  in  which  author  #3  (Claire)  of 
Figure  23  can  mention  the  cities  for  which  she  co-authored  travel  guides.  The  point  at  which 
her  identity  becomes  known  in  any  such  release  sequence  is  circled.  If  Claire  does  not  mention 
Caen,  one  can  infer,  via  relation  G  of  Figure  23,  that  she  co-authored  a  travel  guide  for  that 
city  as  soon  as  she  mentions  the  other  two  cities,  Berlin  and  Dublin,  in  either  order. 


is  two,  while  the  maximal  length  is  three.  If  Claire  can  control  how  information  is  released, 
then  she  can  choose  to  reveal  what  might  otherwise  be  inferred,  namely  that  she  co-authored 
a  travel  guide  to  Caen,  thereby  delaying  her  identification. 

Finally,  we  observe  that  the  order  of  attributes  released  may  or  may  not  matter.  In  the 
travel  guide  example,  Claire  should  mention  Caen  before  the  end  of  her  revelations  (if  she 
wants  to  delay  her  identification),  but  the  order  of  cities  mentioned  is  otherwise  irrelevant. 
The  topology  of  the  doubly-labeled  poset  Pq  encodes  this  order  (in) dependence,  as  we  will  see 
shortly.  Indeed,  much  of  the  remainder  of  this  report  examines  the  connection  between  the 
topology  of  a  relation’s  doubly-labeled  poset  and  the  length  of  attribute  release  sequences. 

10.2  Inferences  on  a  Lattice 

The  doubly-labeled  poset  of  a  relation  produces  a  lattice  [21],  as  follows: 

Definition  12  (Galois  Lattice).  Let  R  be  a  relation  on  X  x  Y,  with  X^®  and  Y 0.  Let 
Pr  be  the  associated  doubly-labeled  poset. 

(Recall  from  Definition  3  on  page  20  that  an  element  of  Pr  is  a  pair  (a,  7),  with 
0  ^  cr  =  i/jr^)  e  r  and  0  7^  7  =  ^r(ct)  6  $R. 

We  previously  defined  a  partial  order  on  Pr  by  (07, 71)  <  (07, 72)  iff  07  C  <72  (iff  71  D  72)-) 
Pr  may  already  contain  a  bottom  element  of  the  form  ( a ,  Y),  with  a  those  individuals  in  X 
who  have  all  the  attributes  in  Y.  If  not,  we  adjoin  (0,T)  to  the  bottom  of  Pr. 

Pr  may  already  contain  a  top  element  of  the  form  (X,  7),  with  7  those  attributes  in  Y  that 
every  individual  in  X  has.  If  not,  we  adjoin  (A',  0)  to  the  top  of  Pr. 

We  refer  to  the  resulting  poset  as  the  Galois  lattice  P((.  It  has  lattice  operations  V  and  A: 

(07,71)  v  (02,72)  =  {{fi’R  0  fM(0i  U  0-2 ),  71C72), 

(01,71)  A  (0-2,72)  =  (01  Flo-2,  (4>R0^r)( 71U72)). 

We  sometimes  refer  to  the  bottom  element  of  P^  by  Or  and  to  the  top  element  by  1r. 
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(5 12,  A)  (123,  B)  (234,  C)  (345,  D)  (451,  E) 

/\  /\  /\  /\  /\ 

(51,  EA)  (12,  AB)  (23,  BC)  (34,  CD)  (45,  DE)  (51,  EA) 


( 1 ,  EAB)  (2,  ABC)  (3,  BCD)  (4,  CDE)  (5,  DEA) 


(0,  ABCDE) 


Figure  25:  The  lattice  Pq  for  the  travel  guide  relation  of  Figure  23.  Each  element  is  a  pair  of 
sets  (<r,  7)  such  that  o  =  ipcil)  and  7  =  $0(0).  (We  have  elided  commas  and  braces  in  sets, 
for  ease  of  viewing.)  The  lattice  operations  model  inferences  possible  from  observations.  For 
instance,  (123,  B)  A  (345,  D)  =  (3,  BCD),  meaning  that  observation  of  attributes  B  and  D  permits 
inference  of  additional  attribute  C  and  identification  of  author  #3.  (In  Figure  23,  attribute 
C  is  the  travel  guide  for  Caen  and  author  #3  is  Claire.)  The  lattice  wraps  around,  with 
element  (51,  EA)  duplicated  for  ease  of  viewing.  If  one  removes  the  top  and  bottom  elements, 
the  remaining  poset  Pq  has  S1  homotopy  type,  just  like  the  Mobius  strip. 


Figure  25  shows  the  lattice  Pq  for  the  travel  guide  relation  of  Figure  23.  Observe  how  the 
lattice  encodes  attribute  and  association  inferences  (or  lack  thereof)  via  its  lattice  operations. 

Special  Cases:  It  can  happen  that  the  lattice  consists  of  a  single  element.  For  example, 
with  relation  C  as  on  page  23,  Pq  =  Pc  =  {(X,  Y')}.  In  particular,  Oc  =  1  c- 

Definition  12  ignores  the  situation  in  which  R  is  void.  One  possibility  is  to  view  P^  as  void 
and  Pr  as  degenerate. 

10.3  Preserving  Attribute  Privacy  for  Sets  of  Individuals 

Theorem  9  on  page  28  described  the  conditions  under  which  an  individual  has  full  attribute 
privacy.  For  such  an  individual,  the  order  in  which  that  individual  (or  anyone)  releases  the 
individual’s  attributes  is  irrelevant.  Any  order  is  fine.  Only  once  all  attributes  have  been 
released,  can  an  observer  definitively  identify  the  individual.  Theorem  10  described  a  similar 
result  for  certain  sets  of  individuals,  including  those  with  whom  an  individual  is  confusable 
after  only  some  of  his/her  attributes  have  been  released. 

Consider  Lk(\H<3,3),  modeled  by  relation  C  as  in  Figure  26.  This  relation  describes  the 
authors  with  whom  Claire  has  collaborated,  via  their  co-authored  books.  The  Dowker 
complexes  are  contractible,  so  by  either  Theorem  9  or  Theorem  10,  we  know  that  some  attribute 
inference  is  possible  involving  Claire.  Lemma  11  on  page  29  tells  us  to  look  for  a  subset  of 
{Berlin,  Caen,  Dublin}  that  is  a  simplex  of  but  not  of  4>c.  As  is  apparent  from  the  figure, 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


40 


Informative  Attribute  Release  Sequences 
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(Ben) 
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Figure  26:  Relation  C  describes  Lk(4'G>,3),  the  link  of  Claire  in  the  relation  of  Figure  23. 
(Each  maximal  simplex  in  any  one  complex  is  labeled  with  its  generating  attribute  or  individual 
from  the  other  complex.  Generators  of  non-maximal  simplices  are  indicated  in  parentheses.) 


the  set  {Berlin,  Dublin}  satisfies  these  conditions,  consistent  with  our  earlier  observations. 
Alternatively,  looking  at  Pq  in  Figure  27,  we  see  that  (12,  B)  A  (45,  D)  =  (0,BD),  allowing 
us  to  draw  the  same  conclusion.  Consequently,  Claire  should  be  sure  to  mention  her  travel 
guide  for  Caen  early  on,  not  leave  it  for  last,  if  she  wants  to  delay  identification. 

PC  ^^^(1245,0) 

(12,  B)  (24,C)  (45,  D) 

\/\  / 

(2,  BC)  (4,  CD) 


(0,  BCD) 


Figure  27:  The  lattice  Pq  for  the  link  of  Claire,  as  given  in  Figure  26.  (Here  authors  appear 
as  integer  indices  and  city  names  appear  as  first  letter  abbreviations.)  Observe  that  this  lattice 
may  be  viewed  as  a  sublattice  of  Pq,  containing  all  elements  that  include  individual  #3  there, 
but  with  that  individual  removed  here. 

Now  let  us  take  this  reasoning  one  step  further.  Consider  an  element  of  Pq  corresponding  to 
some  state  just  prior  to  identification  of  Claire,  for  instance  (23,  BC).  This  element  corresponds 
to  both  of  the  first  two  release  sequences  of  Figure  24:  Claire  has  mentioned  her  work  regarding 
the  travel  guides  for  Berlin  and  Caen,  but  has  not  yet  mentioned  Dublin.  Thus  there  is 
still  some  ambiguity  as  to  her  identity  (it  is  either  author  #2  or  author  #3).  In  terms  of 
Theorem  10  on  page  29,  o  =  {2,3},  7  =  {Berlin,  Caen},  and  k  =  2. 

Figure  28  shows  the  relation  describing  Lk(4,G>,  {2,  3}).  The  Dowker  complexes  have  §° 
homotopy  type,  thus  satisfying  the  topological  conditions  of  Theorem  10.  Consequently,  there 
is  no  attribute  inference  possible  in  the  encompassing  relation  based  on  attributes  that  appear 
in  the  link.  That  means  the  order  in  which  Claire  releases  the  two  attributes  Berlin  and 
Caen  is  immaterial.  This  conclusion  is  consistent  with  the  conclusion  one  draws  upon  explicitly 
enumerating  all  release  sequences,  as  in  Figure  24. 
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Figure  28:  Relation  Q  describes  Lk(4'c,  {2,  3}),  the  combined  link  of  authors  #2  and  #3  (Ben 
and  Claire)  in  the  relation  of  Figure  23.  These  two  authors  have  together  collaborated  with 
each  of  authors  #1  and  #4  (Alice  and  David)  but  have  not  both  together  collaborated  with 
author  ff5  (Eric).  The  two  Dowker  complexes  are  each  instances  of  §°,  so  essentially  the 
same.  The  corresponding  lattice  Pq  is  also  very  simple. 


10.4  Informative  Attribute  Release  Sequences 

This  subsection  defines  more  precisely  the  idea  of  controlled  information  release.  These 
definitions  will  help  us  better  understand  holes  in  a  relation’s  Dowker  complexes,  via 
Theorem  10.  Subsequently,  Section  11  will  test  that  insight  with  data  from  the  world  wide 
web. 

Definition  13  (Attribute  Release  Sequence).  Let  R  be  a  relation  on  X  x  Y ,  with  both  X  and 
Y  nonempty.  An  attribute  release  sequence  for  R  is  a  nonempty  set  of  attributes  from  Y 
released  in  a  particular  sequential  order: 

yi,y2,...,yk,  withk>l. 

We  say  that  the  sequence  has  length  k. 

We  say  that  an  attribute  release  sequence  is  informative  if 

Vi  0  (4>R0  2pR)({y1,...,yi_i}),  for  all  1  <  i  <  k. 

(Note:  for  i  =  1,  the  requirement  states  that  y\  0  (cpR  o  Vir)(0)  =  4>r( X).) 

(We  sometimes  use  the  abbreviation  'iars'  to  mean  either  'informative  attribute 
release  sequence'  or  'informative  attribute  release  sequences ' .) 

Interpretation:  When  i  =  1,  the  argument  to  (bn  o  'tpn  is  the  empty  set,  so  the  condition 
requires  that  y\  fL  (fji(X).  In  other  words,  y±  may  not  be  any  attribute  that  is  shared  by 
all  individuals  in  X.  Any  such  attribute  could  be  inferred  “for  free”  and  thus  would  not 
be  informative.  Thereafter,  the  condition  requires  that  any  attribute  to  be  released  not  be 
inferable  from  those  already  released. 

We  are  interested  in  understanding  the  extent  to  which  order  of  release  matters: 

Definition  14  (Isotropy).  Let  R  be  a  relation  on  X  x  Y,  with  both  X  and  Y  nonempty. 
Suppose  0  /  7  C  Y. 

We  say  that  7  is  isotropic  if  every  possible  ordering  of  all  the  elements  in  7  forms  an 
informative  attribute  release  sequence  for  R. 
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We  are  interested  in  the  minimal  and  maximal  lengths  of  informative  attribute  release 
sequences: 

Definition  15  (Identification  and  Minimal  Identification).  Let  R  be  a  relation  on  X  x  Y. 

We  say  that  a  set  of  attributes  7  C  Y  identifies  a  set  of  individuals  a  C  X  in  R  when 
V’-r(t)  =  a-  (We  sometimes  alternatively  say  that  7  localizes  (to)  a.) 

We  say  that  7  is  minimally  identifying  (for  a)  if  both  the  following  conditions  hold: 

(i)  Vh?( 7)  =  o, 

(ii)  iPr( y')  D  a  for  every  7'  C  7. 


Definition  16  (Identification  Lengths).  Let  R  be  a  relation  on  X  x  Y,  with  both  X  and  Y 
nonempty.  Suppose  (a,  7)  £  Pr.  Define  the  fast  and  slow  attribute  release  lengths  for  a  as 


rfast(cr)  =  min {|7|  7  £  $r  and  7)  =  a}. 


rsiow(c)  =  max  {A: 


Vi, 


,  yk  is  an  iars  for  R  and  ipR({y  1, . . . ,  yk })  =  &}■ 


An  argument  similar  to  that  in  Appendix  D  shows  that  the  following  problem  is  NP- 
complete:  Given  a,  is  there  some  minimally  identifying  7  for  a  of  size  at  most  k ? 

10.5  Isotropy,  Minimal  Identification,  and  Spheres 

There  is  no  requirement  in  Definition  13  that  an  informative  attribute  release  sequence  be  a 
simplex  in  &r.  Indeed,  when  working  with  links,  it  is  useful  to  create  informative  attribute 
release  sequences  that  are  not  simplices  in  the  link,  thereby  identifying  a  set  of  individuals 
in  the  encompassing  relation,  as  per  Lemma  11.  However,  it  is  always  the  case  that  any 
inconsistency  arises  only  with  the  last  attribute  released: 

Lemma  17  (Almost  a  Simplex).  Let  R  be  a  relation  on  X  x  Y ,  with  both  X  and  Y  nonempty. 
Suppose  {yi , . . . ,  yk}  is  an  informative  attribute  release  sequence  for  R. 

Then  {yi, . .  .,yk-i}  G  <hR. 

Proof:  If  {yi, . . . ,  yk- 1}  0  ‘Lr,  then  (fR  o  ifR)({yi, ...,  yk- 1})  =  0r(0)  =  Y.  Since  yk  £  Y , 
this  contradicts  the  requirement  of  Definition  13.  □ 

Interestingly,  when  an  informative  attribute  release  sequence  is  a  simplex,  then  being 
isotropic  is  equivalent  to  being  minimally  identifying.  Moreover,  topologically,  we  can 
characterize  this  isotropy  property  as  a  sphere  appearing  via  a  restricted  link: 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


Final  Report,  AFOSR  Award  FA9550-14-1-0012 


43 


Q' 

Berlin 

Dublin 
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4  (David) 

• 

5  (Eric) 

• 

%' 

Berlin  Dublin 

•  • 
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in\  / Du 
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Figure  29:  Relation  Q'  =  Q(cr,  7),  for  the  book  authorship  example  of  Figure  23,  with  a  =  {3} 
and  7  =  {Berlin,  Dublin}.  Relation  Q'  describes  the  link  of  author  #3  (Claire)  restricted 
to  the  attribute  set  {Berlin,  Dublin}.  See  Figure  26  for  the  full  link  relation.  (Each  maximal 
simplex  in  the  Dowker  complexes  is  again  labeled  with  its  generating  individuals  or  attribute.) 


Definition  18  (Restricted  Link).  Let  R  be  a  relation  on  X  x  Y . 

Suppose  o  e  H/r  and  7  C  ^ 5r(ct ). 

Define  relation  Q(a,  7)  as  follows: 

Q(c,  7)  =  #lxx7’  with  X=\JXy\a. 

y& 7 

77ie  Dowker  complexes  are  defined  in  the  standard  way,  except  for  these  special  cases: 

If  a  =  X ,  we  let  and  7)  instances  of  the  void  complex  0. 

If  c 7  ^  X  but  X  =  0,  we  let  and  <&q(CTi7)  be  instances  of  the  empty  complex  {0}. 

We  say  that  Q(a,  7)  models  the  link  of  a  restricted  to  7. 


Comment:  Although  the  previous  definition  looks  similar  to  that  for  Lk(4'^j,  a)  on  page  27, 
there  are  some  differences:  (a)  Here,  we  require  that  a  be  a  simplex  in  \k#.  (b)  Here,  we  do 
not  assume  7  =  4>r(o),  merely  7  C  </>#(<r).  (c)  Finally,  Definition  8  on  page  27  creates  an 
empty  complex  whereas  the  current  definition  creates  a  void  complex  when  a  =  X  e  \k^.  In 
summary:  When  a  X,  Q(a,  7)  models  those  simplices  of  Lk(\k^,cr)  that  are  witnessed  by 
attributes  in  7. 

Theorem  19  (Isotropy  =  Minimal  Identification  =  Sphere).  Let  R  be  a  relation  and  suppose 
0  7^  7  E  $>r.  Let  a  =  ^(7).  TTiera  t/ie  following  four  conditions  are  equivalent: 

(a)  7  is  isotropic. 

(b)  7  is  minimally  identifying  (for  a). 

(c)  ^Q(a, 7)  -  §fc_2:  with  k  =  \^y\. 

(d)  $Q(a,7)  =  5(7)- 

See  Appendix  F.3  for  a  proof. 
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Collaboration  Example  Revisited:  To  illustrate  Theorem  19,  consider  again  the  example 
of  Figure  23.  Recall  that  together  the  travel  guides  for  Berlin  and  Dublin  identify  Claire. 
Indeed,  {Berlin,  Dublin}  is  a  minimally  identifying  set  of  books  for  Claire.  It  is  isotropic, 
as  Figure  24  shows.  Figure  29  depicts  the  link  of  Claire  restricted  to  {Berlin,  Dublin}, 
modeled  by  relation  Q' .  Observe  that  4>q/  =  ^({Berlin,  Dublin})  and  that  4/q/  ~  S°,  as  the 
theorem  asserts. 

10.6  Poset  Lengths  and  Information  Release 

We  have  seen  how  minimal  identification  appears  topologically  via  spheres.  Spheres  are 
isotropic  so  perhaps  it  is  not  surprising  that  they  encode  isotropic  attribute  release  sequences. 
We  cannot  therefore  expect  a  spherical  characterization  for  the  problem  of  finding  a  maximally 
long  informative  attribute  release  sequence.  Instead,  we  find  an  answer  in  the  combinatorial 
structure  of  the  doubly-labeled  poset  PR  and  its  lattice  P^  .  We  summarize  the  key  results 
below.  For  proofs,  see  Appendix  F. 

Lemma  20  (Informative  Attributes  from  Maximal  Chains).  Let  R  be  a  relation  on  X  x  Y , 
with  both  X  and  Y  nonempty.  Suppose  {(cr*,,  7*,)  <  •  •  •  <  (01,71)  <  (00,70)},  with  k  >  1,  is  a 
maximal  chain  in  P)f. 

Define  y\, . . . ,  yy.  by  selecting  some  yi  £  7$  \  7i-i,  for  each  i  =  1, . . . ,  k. 

Then  yi, . . . , yjt  is  an  informative  attribute  release  sequence  for  R. 

Moreover,  (fiR  o  fiR)({yi, . . . ,  yi})  =  7 *  for  each  i  =  0, 1, . . . ,  k. 

(Notes:  (a)  For  a  maximal  chain,  7*.  =  Y  and  00  =  A\  (b)  The  hypothesis  k  >  1  excludes 
any  relation  R  for  which  0R  =  1R.) 

Lemma  20  implies  that  every  maximal  chain  in  the  doubly-labeled  poset  associated  with  a 
relation  gives  rise  to  an  informative  attribute  release  sequence  that  tracks  the  chain. 

A  partial  converse  holds  as  well: 

Lemma  21  (Chains  from  Informative  Attributes).  Let  R  be  a  relation  on  X  x  Y,  with  both  X 
and  Y  nonempty.  Suppose  y±, ...  ,yk  is  an  informative  attribute  release  sequence  for  R,  with 
k  >  1. 

Let  7 i  =  (cj)R  o  fiR)({yi, . . .  ,yi })  and  cq  =  ipR( 7*),  for  i  =  1, . . . ,  k. 

Then  {(0*1,7,  k)  <  •••  <  (01,71)  <  (X,  70)}  is  a  (not  necessarily  maximal)  chain  in  P#, 
with  70  =  <t>R{X). 

Consequently  one  can  obtain  all  informative  attribute  release  sequences  as  subsequences  of 
those  constructed  from  maximal  chains  in  P)fi . 

Comment  about  “length”:  The  length  £(P)  of  a  poset  P  is  defined  to  be  one  less  than 
the  number  of  elements  in  the  longest  chain  of  the  poset  [18].  The  length  of  an  informative 
attribute  release  sequence  yi,  •  •  •  ,yk  is  k.  These  definitions  match  much  like  the  dimension  of 
a  simplex  is  one  less  than  the  number  of  its  elements. 

Corollary  22  (Maximal  Length).  The  maximum  length  of  an  informative  attribute  release 
sequence  for  a  relation  R  is  l(P(,). 
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Corollary  23  (Maximal  Identification  Length).  Suppose  R  is  a  relation  such  that  no  attribute 
is  shared  by  all  individuals.  For  any  (<7,7)  G  Pr,  rsiow(cr)  =  ((Pq^^A  +  2. 

Collaboration  Example  Re-Revisited:  Returning  again  to  the  travel  guide  example, 
observe  in  Figure  25  that  ((Pq)  =  4.  This  tells  us,  by  Corollary  22,  that  a  longest  informative 
attribute  release  sequence  for  relation  G  contains  four  attributes.  Indeed,  we  can  pick  three 
attributes  to  identify  an  individual,  and  then  a  fourth  to  form  an  inconsistency.  How  do  we 
know  that  we  can  choose  three  attributes  informatively  to  identify  an  individual?  For  example, 
Lk(\kfl,  Claire)  is  shown  in  Figure  26  with  associated  lattice  Pq  in  Figure  27.  In  this  case 
((Pc)  +  2  =  (.(Pq)  =  3.  Moreover,  by  the  construction  of  Lemma  20,  we  can  read  off  four 
different  such  informative  sequences,  namely  the  first  four  sequences  appearing  in  Figure  24. 

We  thus  see  that  rsiow(Claire)  =  3,  and  as  we  have  seen  previously,  rfast(Claire)  =  2. 
In  other  words,  if  Claire  has  control  over  how  to  release  information,  she  can  draw  out 
identification  for  three  books,  while  the  fastest  anyone  can  identify  her  is  via  two  books. 


Figure  30:  Relation  T  describes  four  individuals  with  four  attributes,  with  Dowker  complexes 
that  are  boundary  complexes  of  tetrahedra,  meaning  they  have  homotopy  type  82. 

In  contrast,  consider  the  tetrahedral  relation  of  Figure  30.  The  Dowker  complexes  are 
boundary  complexes,  so  we  know  that  no  attribute  or  association  inference  is  possible.  This 
is  evident  from  the  lattice  Pi£  depicted  in  Figure  31  as  well.  It  has  length  4,  just  as  did 
the  travel  guide  lattice,  but  the  inference  structure  is  now  different.  For  any  (<7, 7)  G  Pt , 
with  Q  =  Q(a,  7)  modeling  Lk(4'j’,cr)  on  attributes  7,  we  see  that  <Lq  =  5(7)  and  thus 
that  ((Pq)  =  ((Pq)  +  2  =  |7| .  This  tells  us,  by  Theorem  19  and  Corollary  23,  that 
Hast (<7 )  =  rsiow(<r)  =  |7|,  as  one  would  expect  in  an  inference-free  world.  For  a  specific  instance, 
Figure  32  describes  Q  =  Q({3},  {a,  c,  d})  along  with  Q' s  Dowker  complexes  and  the  lattice  Pq. 

10.7  Hidden  Holes 

We  saw  via  Theorem  19  that  whenever  a  set  of  attributes  7  minimally  identifies  some  set  of 
individuals  a,  then  the  link  of  a,  restricted  to  those  simplices  that  are  witnessed  by  attributes 
in  7,  defines  a  sphere  in  both  Dowker  complexes.  It  is  a  hole. 

All  sets  of  individuals  that  are  identifiable  in  some  way,  in  other  words,  that  appear  in 
the  doubly-labeled  poset  Pr  of  a  relation,  must  be  minimally  identifiable  in  some  way.  That 
suggests  there  must  be  holes  everywhere  in  a  relation’s  Dowker  complexes,  and  yet  we  do  not 
see  many  holes.  What  is  going  on? 
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Figure  31:  The  lattice  for  the  tetrahedral  relation  of  Figure  30.  Each  element  is  a  pair  of 
sets  (cr,  7)  such  that  o  =  V’tCt)  and  7  =  (We  have  elided  commas  and  braces  in  sets,  for 

ease  of  viewing.)  The  lattice  is  isomorphic  to  the  Boolean  algebra  on  four  elements,  consistent 
with  the  fact  that  T  preserves  both  association  and  attribute  privacy.  If  one  removes  the  top 
and  bottom  elements,  the  remaining  poset  Pt  has  §2  homotopy  type. 


The  answer  is  that  the  restricted  link  construction  Q(a,  7)  focuses  on  a  particular 
subrelation,  thereby  highlighting  the  hole.  The  hole  itself  could  be  hidden  in  the  encompassing 
relation.  For  instance,  we  saw  that  relation  Q  of  Figure  32  defines  an  S1  hole.  If  Q  happened 
to  be  a  subrelation  of  relation  R  as  in  Figure  33,  then  Q  would  not  be  a  hole  when  viewed  in 
R,  merely  a  boundary. 

Notice  that  the  lattice  P^  is  isomorphic  to  the  lattice  Pq  .  The  difference  is  that  for  every 
lattice  element  (<7,7),  the  set  of  individuals  a  includes  3  in  P^  but  not  in  Pq.  Consequently, 
the  bottom  element  (3,  acd)  of  P^  is  actually  an  element  of  the  poset  Pr.  meaning  A (Pr)  is 
a  cone,  hence  contractible.  In  contrast,  the  poset  Pq  does  not  contain  the  bottom  element 
(0,  acd)  of  Pq  and  so  A  (Pq)  has  S1  homotopy  type. 

Aside:  Why  not  always  focus  on  a  relation’s  lattice  rather  than  its  doubly-labeled  poset? 
Because  the  lattice  is  always  contractible.  Any  interesting  topology  lies  in  the  poset.  See  [18]. 

Conclusion:  Even  though  R  is  contractible,  it  offers  the  same  choices  for  informative 
attribute  release  sequences  as  does  Q.  More  generally,  the  analysis  of  this  subsection  suggests 
that  one  look  for  holes  in  subrelations  of  a  given  relation.  Looking  at  links  is  one  way  to  focus 
on  subrelations.  Removing  individuals  or  attributes  that  represent  cone  apexes  is  another, 
as  we  just  saw.  More  generally,  any  simplicial  cycle  that  can  be  represented  by  a  subrelation 
defines  a  useful  hole  even  though  the  hole  appears  to  be  filled-in.  So  long  as  one  can  remove  any 
coboundary  of  that  cycle,  by  restricting  the  relation  without  destroying  the  cycle ,  the  cycle  is 
informational.  In  particular,  it  offers  opportunities  for  informative  attribute  release  sequences, 
as  the  next  subsection  makes  precise. 
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Figure  32:  Relation  Q  models  Lk(4''r,3),  with  T  as  in  Figure  30. 


Figure  33:  Relation  R  fills  in  the  hole  of  relation  Q  from  Figure  32.  It  is  still  true  that  Q 
models  a  link,  namely  Lk(4'jj,3).  R  and  Q  have  the  same  lattice  structure,  but  the  bottom 
element  of  defines  the  set  of  individuals  {3},  whereas  the  bottom  element  of  Pq  defines 
the  empty  set.  Thus  relation  R  defines  a  contractible  poset  for  Pp>,  whereas  relation  Q  defines 
an  S1  hole  for  Pq. 

10.8  Bubbles  are  Lower  Bounds  for  Privacy 

We  have  seen  minimal  identifiability  characterized  by  holes,  via  Theorem  19.  The  previous 
subsections  make  clear  that  the  topological  characterization  of  rsiow  is  not  so  direct.  In  this 
subsection  we  establish  a  sufficient  condition.  We  will  see  that  holes  provide  lower  bounds  for 
rsiow-  We  will  focus  on  a  relation  and  its  links,  but  these  results  apply  more  generally  to  any 
hidden  holes  made  visible  by  focusing  on  subrelations,  as  outlined  in  the  previous  subsection. 

The  connection  between  a  relation’s  poset  Pr  and  its  lattice  P^  suggests  the  following: 

Definition  24  (Almost  a  Join-Based  Lattice).  Let  P  be  a  finite  poset.  We  say  that  P  is 
almost  a  join-based  lattice  if  adjoining  a  top  element  1  means  P  U  {1}  is  a  join  semi-lattice. 


Comments:  (a)  We  adjoin  1  even  if  P  already  has  a  top  element,  (b)  Since  P  is  finite,  if  P 
is  almost  a  join-based  lattice,  then  if  we  adjoin  both  a  top  element  1  and  a  bottom  element  0, 
the  result  will  be  a  lattice.  See  also  [18] . 

This  definition  leads  our  key  insight  (for  a  proof,  see  Appendix  G): 

Theorem  25  (Many  Chains).  Let  P  be  almost  a  join-based  lattice.  Suppose  P  has  reduced 
integral  homology  in  dimension  k  >  0,  that  is,  iL^(A(P);Z)  7^  0. 

Then  there  are  at  least  (k  +  2)!  maximal  chains  in  P  of  length  at  least  k. 
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Interpretation:  The  theorem  says  that  a  homology  hole  acts  at  least  as  powerfully  as  a 
spherical  hole,  from  the  perspective  of  producing  informative  attribute  release  sequences. 
Consider  again  the  tetrahedral  relation  of  Figure  30.  The  Dowker  complexes  form  two- 
dimensional  holes,  so  k  =  2  and  {k  +  2)!  =  24.  The  poset  Pt  is  the  proper  part  of  the 
lattice  shown  in  Figure  31,  that  is,  all  the  elements  except  the  topmost  and  bottom-most. 
There  are  indeed  24  different  chains  of  length  2,  i.e. ,  containing  three  elements,  in  P f. 

These  chains  represent  the  24  different  ways  in  which  one  might  start  at  a  vertex  of  one  of 
the  Dowker  complexes,  walk  from  that  vertex  to  the  middle  of  an  incident  edge,  then  walk  from 
the  middle  of  that  edge  to  the  centroid  of  an  encompassing  triangle.  For  instance:  the  walk 
from  the  vertex  {a}  to  the  edge  {a,  c}  to  the  triangle  {a,  c,  d}  in  $>t-  One  can  think  of  this  walk 
as  sequential  acquisition  of  attribute  information  about  an  individual  in  a  particular  order.  The 
order  may  perhaps  be  determined  by  chance  or  perhaps  by  an  individual  purposefully  releasing 
information  in  a  particular  order.  Once  (and  only  once)  one  has  arrived  at  the  center  of  the 
triangle,  has  one  fully  identified  the  individual  (in  this  case,  as  individual  #3). 

With  that  observation,  we  finally  see  how  the  global  geometry  /  topology  of  the  Dowker 
complexes,  as  encoded  in  their  common  poset,  affects  inference,  beyond  the  local  simplicial 
collapses  of  the  closure  operators.  We  will  presently  formalize  this  insight  via  two  corollaries 
to  Theorem  25. 


Figure  34:  Relation  R  describes  three  individuals  all  of  whom  have  the  exact  same  three 
attributes.  The  Dowker  complexes  are  both  triangles,  but  the  poset  Pji  is  a  single  point.  This 
single  point  captures  the  indistinguishability  of  the  individuals  and  the  attributes.  In  fact, 
Pr  =  Pro  meaning  one  can  infer  everything  from  nothing. 

We  caution  that  the  dimension  of  a  simplex  in  a  Dowker  complex  is  not  meaningful  in  and 
of  itself,  since  the  simplex  may  collapse  under  the  closure  operators.  (Consider  the  example  of 
Figure  34,  in  which  the  Dowker  complexes  are  full  triangles,  but  the  doubly-labeled  poset  is 
a  single  point.)  Instead,  the  length  of  chains  in  a  relation’s  poset  is  significant.  Holes  prevent 
these  chains  from  being  short. 

Corollary  26  (Holes  Reduce  Inference).  Let  R  be  a  relation.  Suppose  Pr  has  reduced  integral 
homology  in  dimension  k  >  0.  Then  there  are  at  least  (A: +  2)!  maximal  chains  in  Pr  of  length 
at  least  k. 

Corollary  27  (Holes  Defer  Recognition).  Let  R  be  a  relation  and  let  (cr, 7)  E  Pr. 

Define  Q  =  Q(a,  7)  as  per  Definition  18  and  recall  Definition  16,  from  pages  f2-f3. 
Suppose  Pq  has  reduced  integral  homology  in  dimension  k  >  0. 

Then  there  are  at  least  [k  +  2)!  distinct  informative  attribute  release  sequences  y\, . . .  ,ye 
for  R,  each  with  l  >  k  +  2,  such  that  fi>R({yi,  ■  ■  ■ ,  ye})  =  cr.  Consequently,  rsiow(cr)  >  k  +  2. 
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Comment:  Since  (<7,7)  £  Pr  and  by  the  homology  hypothesis,  relation  <5(17,7)  models  link 

Lk(^R,(7). 


Collaboration  Example  Once  Again:  The  Dowker  complexes  for  the  travel  guide  example 
of  Figure  23  have  S1  homotopy  type,  meaning  Pq  has  homology  in  dimension  k  =  1. 
Corollary  26  therefore  says  that  there  are  at  least  6  maximal  informative  attribute  release 
sequences  in  Pq.  Being  maximal,  each  must  identify  some  author.  In  fact,  we  saw  that 
there  were  4  different  maximal  informative  attribute  release  sequences  for  identifying  any  one 
author.  Since  there  are  5  authors,  Pg  actually  contains  at  least  20  distinct  maximal  informative 
attribute  release  sequences.  Can  we  find  more  via  our  corollaries?  Not  directly  for  individual 
authors,  since,  as  we  saw  via  Figure  26,  the  link  of  any  one  author  is  contractible,  meaning 
that  Corollary  27  does  not  help  us  directly. 

There  is  more  to  be  said  however:  The  proof  of  Theorem  25  actually  establishes  that, 
for  certain  representatives  of  a  homology  class,  the  maximal  elements  in  the  support  of  that 
representative  give  rise  to  ( k  +  1)!  chains.  In  the  collaboration  example,  by  choosing  the 
homology  generator  appropriately,  this  implies  that  for  each  author  there  are  at  least  two 
informative  attribute  release  sequences  for  identifying  the  author.  That  gives  us  10  sequences 
overall  for  relation  G.  To  find  20,  we  would  have  to  examine  links  of  pairs  of  authors.  There 
are  10  such  links,  5  of  which  look  similar  to  the  one  in  Figure  28  on  page  41.  Each  is  an 
instance  of  8°,  meaning  each  has  two  different  identifying  iars.  That  therefore  gives  us  10 
iars  for  identifying  pairs  of  authors,  and  thus  20  iars  for  identifying  individual  authors.  - 
Corollary  27  further  allows  us  to  conclude  that  the  maximal  length  of  an  informative  attribute 
release  sequence  that  identifies  a  given  pair  of  authors  is  at  least  two.  Consequently,  the 
maximal  length  of  an  informative  attribute  release  sequence  that  identifies  a  given  individual 
author  must  be  at  least  (and  thus  exactly)  three.  —  Observe  that  one  can  draw  these  various 
conclusions  guided  by  homology. 
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Experiments 


11  Experiments 

An  individual  may  wish  to  reveal  information  about  himself/herself  while  delaying  full 
identification.  We  saw  in  Section  10.8  that  homology  provides  a  lower  bound  on  the  number 
and  length  of  such  informative  attribute  release  sequences.  The  lower  bound  need  not  be  tight. 
In  order  to  test  the  existence  of  the  lower  bound  as  well  as  see  that  it  is  not  tight,  we  examined 
two  datasets  of  different  character: 

Medals:  We  obtained  this  dataset  in  August  2014  from 

http : //www . tableausof tware . com/public/community/ sample-data-sets. 

The  dataset  contains  information  about  athletes  who  participated  in  the  Olympics  during 
the  years  2000-2012.  The  attributes  we  considered  were: 

Age,  Country,  Year,  Sport,  Gold  Medals,  Silver  Medals,  Bronze  Medals 

(The  last  three  attributes  count  the  number  of  medals  won  by  an  athlete.) 

Every  athlete  therefore  has  exactly  7  attributes,  with  each  attribute  taking  on  one  of 
a  finite  discrete  set  of  mutually  exclusive  values.  We  represented  these  7  dimensions  of 
multivalent  attributes  as  a  collection  of  233  binary  attributes. 

There  are  8613  individuals  (we  regarded  the  same  athlete  in  different  years  as  distinct 
individuals),  who  partition  into  6955  equivalence  classes  (for  team  sports,  athletes  are 
often  indistinguishable) . 

The  result  is  a  binary  relation  M  with  6955  rows  and  223  columns. 


Jazz:  We  assembled  this  relation  in  June  2015  by  examining  the  website 
http : //www . redhot j  azz . com. 

The  website  contains  information  about  jazz  musicians  and  bands,  mainly  from  the  early 
to  late-nrid  20th  century. 

We  assembled  a  relation  J  whose  rows  are  indexed  by  musicians  and  whose  columns  are 
indexed  by  bands,  with  (m,  b)  G  J  meaning  that  musician  m  played  in  band  b. 

The  result  is  a  binary  relation  J  with  4896  rows  and  990  columns. 

Cautions:  We  were  not  particularly  careful  to  determine  whether  different  spellings  of 
a  name  really  meant  the  same  person.  For  some  bands,  the  website  listed  one  or  more 
bandmembers  as  “unknown”.  We  ignored  those  bandmembers.  We  ignored  bands  for 
whom  we  could  not  determine  any  bandmembers.  Since  our  goal  was  to  understand  how 
homology  influences  the  existence  of  informative  attribute  release  sequences,  such  noise 
in  constructing  a  relation  should  not  be  particularly  significant.  If  one  wished  to  draw 
sociological  conclusions  about  the  spread  of  music,  one  would  need  to  be  more  careful. 

We  encountered  the  jazz  website  because  it  was  the  source  of  data  for  a  paper  on 
collaboration  networks  [11]  that  considered  the  dual  nature  of  individuals  and  attributes.  That 
paper  constructed  two  graphs,  one  with  musicians  as  vertices  and  bands  as  edges,  the  other 
with  those  roles  reversed.  One  can  view  those  graphs  as  the  1-skeleta  of  our  Dowker  complexes. 
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The  paper  observed  that  drawing  conclusions  in  one  space  might  be  easier  than  in  the  other, 
depending  on  the  question  being  asked.  The  paper  drew  some  conclusions  about  musical 
influence. 


11.1  Compare  and  Contrast 

We  review  some  key  differences  between  the  two  relations  M  and  J. 


Identifiability:  The  original  8613  individuals  in  the  Olympic  Medals  dataset  were  not  all 
uniquely  identifiable.  For  some  athletes,  even  knowing  an  athlete’s  full  set  of  7  attributes 
left  ambiguity  as  to  the  athlete’s  identity.  This  was  true  for  2810  of  the  athletes. 
Fortuitously,  an  athlete’s  ambiguity  was  fully  symmetric,  meaning  that  one  could  in  fact 
partition  the  set  of  all  athletes  into  equivalence  classes.  This  symmetry  was  likely  due  to 
the  fact  that  some  competitions  involved  teams,  with  team  members  indistinguishable 
from  each  other.  Each  equivalence  class  then  formed  a  uniquely  identifiable  “individual” 
in  relation  M. 

For  the  Jazz  relation,  863  of  the  4896  musicians  were  uniquely  identifiable,  but  4033  were 
not.  Unfortunately,  this  time  the  ambiguity  was  not  fully  symmetric.  One  could  again 
partition  the  4033  individuals  into  1022  equivalence  classes  based  on  having  identical 
rows  in  J.  However,  some  rows  remained  subsets  of  other  rows,  giving  a  directionality 
to  the  ambiguity.  For  this  reason,  we  did  not  pass  to  equivalence  classes. 


Attribute  Size:  In  the  medals  relation  M ,  every  individual  has  exactly  7  attributes, 
describing  one  value  for  each  of  the  7  possible  fields:  Age,  Country,  Year,  Sport, 
Gold  Medals,  Silver  Medals,  Bronze  Medals.  Consequently  there  are  also  always 
exactly  7  attributes  in  each  link  relation. 

In  the  Jazz  dataset,  there  was  no  structural  bound  to  the  number  of  bands  in  which 
a  musician  might  have  played,  so  a  musician’s  attributes  could  be  many.  The  largest 
number  of  bands  in  which  any  one  musician  played  was  in  fact  44.  The  average  was  a 
little  over  2  and  the  median  1.  Conversely,  the  largest  band  had  288  musicians,  with  an 
average  of  10.4  and  a  median  of  7. 

Link  Size:  For  M,  the  number  of  other  athletes  in  any  given  athlete’s  link  was  always  close  to 
the  entire  set  of  possible  athletes.  With  only  7  attribute  fields,  any  two  athletes  shared 
almost  certainly  some  attribute  value. 

In  contrast,  for  the  767  musicians  in  J  for  whom  we  computed  links  (described  further  in 
Section  11.4),  the  number  of  other  musicians  in  any  given  musician’s  link  was  relatively 
low.  The  average  was  55.3,  the  median  37,  with  a  maximum  of  301.  With  musicians 
generally  playing  in  few  bands,  each  encountered  on  average  only  a  few  score  fellow 
musicians  of  the  4895  other  musicians  he/she  might  have  encountered. 
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Homology  and  Release  Sequences  in  the  Olympic  Dataset 


11.2  Homology  Computations 

For  each  of  the  link  relations  discussed  below,  we  computed  homology  of  the  Dowker  complex 
with  relation  Q  modeling  the  link.3  Since  our  goal  was  to  find  lower  bounds  for  informative 
attribute  release  sequences,  we  modified  $q  slightly,  as  suggested  by  Section  10.7.  Specifically, 
whenever  was  a  cone  with  more  than  one  maximal  simplex,  we  removed  all  its  cone  apexes. 

Note:  The  homology  lower  bound  results  of  Section  10  and  Appendix  G  do  not  depend 
directly  on  the  chain  coefficients  being  integers.  We  therefore  computed  homology  with  Z2 
coefficients,  using  the  Perseus  software  previously  written  at  the  University  of  Pennsylvania: 

http :  //www.  sas.upenn.edu/~vnanda/perseus/. 

11.3  Homology  and  Release  Sequences  in  the  Olympic  Dataset 

Overall  Homology:  A  collection  of  k  multivalent  discrete  attributes  produces  Dowker 
complexes  with  homotopy  types  that  are  wedges  of  Sk~ls,  assuming  that  all  possible 
combinations  of  the  attributes  are  represented  by  individuals. 

Consequently,  with  every  individual  having  exactly  7  attributes,  one  might  expect  to  see 
some  homology  in  dimension  6.  But  of  course,  not  every  combination  is  possible.  For  instance, 
no  one  athlete  is  going  to  simultaneously  win  the  gold,  silver,  and  bronze  medals  in  the  same 
event.  From  this  perspective,  real-world  constraints  show  up  as  absence  of  potential  homology. 
In  fact,  relation  M  has  the  Betti  numbers  described  in  Table  2,  computed  using  Z2  coefficients. 


d 

0  1  2 

3 

4 

Pd 

1  0  23 

757 

503 

Table  2:  Betti  numbers  for  the  topology  of  the  Olympic  Medals  relation. 

The  table  suggests  that  there  are  quite  a  few  informative  attribute  release  sequences  of 
length  at  least  5  for  identifying  athletes. 

Link  Homology:  We  computed  the  link  of  each  athlete  in  M,  and  determined  homology 
for  the  resulting  relation,  with  the  proviso  mentioned  above.  Specifically,  we  removed  all  cone 
apexes  from  an  athlete’s  Dowker  complex  (assuming  it  contained  more  than  one  maximal 
simplex)  before  computing  homology,  with  Q  being  the  link  relation.  Of  the  6955  links,  3822 
contained  attribute  cone  apexes  in 

Table  3  summarizes  the  results.  One  may  conclude  more  strongly  now  that  (at  least)  2198 
athletes  each  have  (at  least)  120  different  ways  of  releasing  (at  least)  5  of  their  7  attributes  in 
ways  that  do  not  fully  identify  the  athlete  before  those  5  attributes  have  been  released. 

Informative  Attribute  Release  Sequences:  We  computed  a  maximal  length  informative 
attribute  release  sequence  for  each  link  relation.  One  can  find  such  a  sequence  by  searching  for 
a  least-cost  path  from  1q  to  0q  in  Pq,  picking  attributes  along  the  way  as  per  the  construction 
of  Lemma  21  on  page  44,  with  cost  being  the  number  of  attributes  inferred  as  one  traverses 
the  path.  Here  Q  is  again  the  link  relation.  Of  the  6955  athletes,  6229  actually  had  a  maximal 
informative  attribute  release  sequence  of  length  7.  Each  such  athlete  could  order  his/her 

3  Formally,  the  link  is  equal  to  'Fq.  By  Dowker’s  Theorem,  and  $<3  have  the  same  homology. 
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d 

0 

1 

2 

3 

4 

#  of  athletes 

229 

1355 

2773 

2198 

57 

max  Pd 
athletes 

2 

4 

7 

4 

2 

Table  3:  Histogram  indexed  by  dimension  d,  describing  athletes  whose  links  Lk(\kM,  athlete) 
have  homology  in  dimension  d  (after  removal  of  attribute  cone  apexes  from  the  dual  complexes), 
for  the  6955  athletes  in  the  Olympic  Medals  relation  M.  Also  shown  are  the  maximum  Betti 
numbers  seen  in  each  dimension,  with  the  maximum  taken  over  all  possible  athletes. 


attributes  in  such  a  way  that  his/her  identity  does  not  become  fully  known  until  s/he  has 
released  all  7  attributes.  Of  the  remaining  athletes,  719  had  a  maximal  informative  attribute 
release  sequence  of  length  6,  and  7  had  a  maximal  length  of  5. 

Of  course,  Corollary  27  makes  a  stronger  claim,  suggesting  permutability  of  attributes. 
Consequently,  we  computed  for  each  link  relation  all  possible  isotropic  sets  of  attributes  (see 
again  Definition  14  on  page  41,  now  with  Q  in  place  of  R).  Table  4  summarizes  the  results. 


|«| 

2 

3 

4 

5 

6 

#  of  athletes 

6955 

6955 

6955 

5568 

171 

max  |{ft} 
athletes 

21 

35 

35 

21 

5 

Table  4:  Histogram  indexed  by  size  |/c|,  describing  athletes  whose  link  relations  contain  isotropic 
attribute  sets  k.  An  athlete  may  have  several  distinct  (possibly  overlapping)  such  sets  for  any 
given  size.  Also  shown  therefore  are  the  maximum  numbers  of  such  sets,  with  the  maximum 
taken  over  all  possible  athletes.  For  example:  171  athletes  have  at  least  one  isotropic  set  of 
size  6  in  their  link  relation,  and  the  maximum  number  of  such  sets  any  one  athlete  has  is  5. 


Scatterplot:  Finally,  we  computed  for  each  link  a  pair  of  numbers  {h,  i),  with  h  representing 
a  measure  of  link  homology  and  i  representing  a  measure  of  informative  attribute  release 
sequences  for  the  link  relation.  The  resulting  scatterplot  appears  in  Figure  35.  One  can  see 
that  homology  acts  as  a  lower  bound  for  informative  attribute  release  sequences. 

The  exact  formulas  for  h  and  i  are  not  that  significant,  but  we  mention  them  here  for 
completeness.  To  get  a  measure  of  homology,  we  assembled  for  each  link  a  vector  with  the 
Betti  numbers  computed  earlier:  (/3q,  Pi,  P2,  An  Pi)-  We  looked  at  all  such  vectors  to  determine 
maximum  values  for  each  component  (as  given  in  Table  3).  We  could  then  think  of  the  vector  as 
defining,  in  reverse  order,  a  varying-radix  numeral.  We  converted  that  numeral  to  an  integer. 
So,  for  instance,  if  a  given  link  were  to  have  Betti  vector  (1,3, 5, 0,0),  then  its  h  value  would 
be  1  +  3- (2  +  1)  +  5 -(4  +  l)-(2  +  1)  =  85.  Observe  that  the  h  value  for  a  contractible  link  is  1. 
In  order  to  graph  the  scatterplot  nicely,  we  scaled  the  h- axis  by  taking  a  fourth  root. 

We  computed  a  link’s  i  value  similarly,  now  from  the  following  vector  of  data: 
(Imax!  C2,  C3,  C4,  C5,  cq).  Here  Anax  is  the  largest  l  in  an  informative  attribute  release  sequence 
y  1, ...  ,y£  for  the  link  relation,  while  Ck  is  the  number  of  different  isotropic  attribute  sets  n  in 
the  link  relation  such  that  |k|  =  k.  We  scaled  the  i-axis  by  taking  a  logarithm. 
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Figure  35:  Scatterplot  describing  each  athlete’s  link  in  the  medals  relation  M.  The  scatterplot 
shows  for  each  link  a  point  ( h,i ),  with  h  a  measure  of  the  link’s  homology  (after  removal  of 
attribute  cone  apexes)  and  i  a  measure  of  how  many  significant  informative  attribute  release 
sequences  exist  for  the  link  relation.  The  scatterplot  underscores  how  homology  is  a  lower 
bound  for  informative  attribute  release  sequences,  as  described  in  Corollary  27. 

(The  colors  and  radii  indicate  the  numbers  of  athletes  in  the  links.  The  color  ordering  and  size 
boundaries  are: 

BLACK— 6821-SILVER— 6831-ORANGE— 6851— GREEN-685g— BLUE— 6865— MAGENTA— 6872-RED. 

In  this  figure,  the  boundaries  between  colors  were  chosen  so  that  each  bucket  holds  roughly 
1000  links.  As  one  can  see,  the  number  of  athletes  in  a  link  is  generally  large.) 
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11.4  Homology  and  Release  Sequences  in  the  Jazz  Dataset 

Overall  Homology:  Given  the  large  number  of  bands  in  which  some  musicians  played,  and 
given  memory  constraints  of  our  laptop  at  the  time,  we  were  not  able  to  compute  homology 
for  the  full  Jazz  relation  J.  Instead,  we  were  able  to  compute  the  homology  for  restricted 
relations  consisting  of  musicians  that  played  in  fewer  than  20  bands.  This  covered  4856  of  the 
4896  musicians  in  the  overall  relation.  Since  we  did  not  see  any  homology  above  dimension  2 
in  several  of  these  restricted  cases,  we  considered  the  3-skeleton  of  as  proxy  for  the  topology 
of  the  full  relation  J  and  computed  its  homology.  Table  5  summarizes  the  results.  We  verified 
using  a  graph  algorithm  that  the  full  relation  J  did  indeed  have  107  components,  as  indicated 
by  /3q  for  the  3-skeleton  $j  .  Given  the  low  amount  of  homology  and  the  low  dimension  of  such 
homology,  J  might  not  be  telling  us  much  about  the  length  of  informative  attribute  release 
sequences  for  the  various  musicians,  suggesting  we  look  at  links. 


b 

E 

m 

ft 

ft 

ft 

ft 

14 

$J\b 

4819 

ill 

613 

20 

0 

15 

4831 

ill 

613 

32 

0 

16 

$J|& 

4838 

ill 

605 

42 

0 

17 

$J\b 

4848 

no 

603 

58 

0 

18 

$J\b 

4851 

no 

603 

65 

0 

19 

®J\b 

4856 

109 

596 

75 

0 

oo 

s. 

CO 

4896 

107 

550 

93 

- 

15 

Fjt 

767 

18 

595 

32 

0 

Table  5:  Betti  numbers  for  subcomplexes  E  of  <bj,  with  J  being  the  Jazz  relation.  The  first 
six  rows  correspond  to  restrictions  of  J  to  musicians  who  played  in  at  most  b  bands.  For  each 
row,  m  indicates  the  number  of  musicians  in  the  relation.  The  penultimate  row  describes  the 
3-skeleton  of  <hj.  The  last  row  refers  to  a  relation  J'  described  further  in  the  text. 

Link  Homology:  We  computed  the  link  of  some  of  the  musicians  in  J,  and  determined 
homology  for  the  resulting  relations  (again  after  removal  of  attribute  cone  apexes).  Table  6 
summarizes  the  results.  Given  the  inability  to  fully  identify  some  musicians  even  knowing 
all  their  bands  (as  described  in  Section  11.1)  and  the  difficulty  of  computing  homology 
for  large  band  memberships,  we  computed  links  only  for  a  subset  of  the  musicians.  We 
required  each  musician  to  be  uniquely  identifiable,  to  have  played  in  at  most  15  bands,  and 
to  have  a  nontrivial  link.  There  were  767  such  musicians.  Betti  numbers  for  the  relation 
J'  representing  the  restriction  of  J  to  these  767  musicians  also  appear  in  Table  5.  (Note, 
however,  that  we  computed  the  full  link  Lk(\kj,  musician)  for  each  of  the  767  musicians,  not 
merely  Lk(\k//,  musician).)  We  removed  attribute  cone  apexes  from  the  link  relation  for  106  of 
these  767  musicians. 

These  results  suggest  that  the  relationships  to  other  musicians  do  indeed  not  have  many 
holes  in  them.  Recall,  from  a  topological  perspective,  one  can  assert  the  existence  of  at  least 
(fc  +  2)!  distinct  informative  attribute  release  sequences  of  length  at  least  k  +  2  for  any  musician 
with  a  fc-dimensional  hole.  For  almost  all  musicians  this  means  2  sequences  of  length  2,  for 
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d 

0 

1 

2 

3 

#  of  musicians 

604 

145 

20 

1 

max  Pd 
musicians 

7 

6 

3 

1 

Table  6:  Histogram  indexed  by  dimension  d,  describing  musicians  whose  links  Lk('kj,  musician) 
have  homology  in  dimension  d  (after  removal  of  attribute  cone  apexes  from  the  dual  complexes), 
for  the  767  musicians  who  are  uniquely  identifiable  in  J,  played  in  at  most  15  bands,  and  had 
nontrivial  link.  Also  shown  are  the  maximum  Betti  numbers  seen  in  each  dimension,  with  the 
maximum  taken  over  the  767  possible  musicians.  For  d  =  0,  this  means  that  604  of  the  767 
musicians  have  connections  to  other  musicians  that  split  into  disjoint  groups.  The  maximum 
number  of  such  disjoint  components  for  any  one  musician  is  7. 

some  it  means  6  sequences  of  length  3,  for  a  few  it  means  24  sequences  of  length  4,  and  for  one 
musician  it  means  120  sequences  of  length  5.  These  observations  are  roughly  in  line  with  the 
actual  data  for  informative  attribute  release  sequences  described  next,  though,  as  expected  for 
the  theoretical  reasons  discussed  earlier,  they  constitute  lower  bounds. 

Informative  Attribute  Release  Sequences:  We  computed  a  maximal  length  informative 
attribute  release  sequence  for  each  link  relation.  Table  7  summarizes  the  results.  We  mention  in 
passing:  Any  attribute  release  sequence  that  is  informative  for  a  link  relation  is  also  informative 
for  the  encompassing  relation.  For  a  few  musicians,  the  maximal  sequence  found  within  the 
link  relation  Q  could  be  further  extended  in  the  encompassing  relation  J,  with  a  prefix  of  one 
attribute,  namely  an  attribute  shared  by  all  members  of  the  link,  yet  remain  informative  and 
identifying.  This  occurred  for  the  17  musicians  whose  maximum  sequence  length  i  was  1. 

We  also  computed  for  each  link  relation  all  possible  isotropic  sets  of  attributes.  Table  8 
summarizes  those  results. 


£ 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

#  of  musicians 

17 

248 

218 

125 

72 

35 

23 

15 

11 

2 

1 

Table  7:  Histogram  of  musicians,  indexed  by  length  t  of  the  longest  informative  attribute 
release  sequence  for  the  musician’s  link  relation,  for  the  767  musicians  described  in  the  text. 


M 

2 

3 

4 

5 

#  of  musicians 

750 

219 

49 

3 

max  |  {ac}  [ 
musicians 

105 

202 

40 

2 

Table  8:  Histogram  indexed  by  size  |k|,  describing  musicians  whose  link  relations  contain 
isotropic  attribute  sets  k.  Also  shown  are  the  maximum  numbers  of  such  sets,  with  the 
maximum  taken  over  the  767  possible  musicians  described  in  the  text. 

Scatterplot:  We  computed  for  each  link  a  pair  of  numbers  ( h,i ),  with  h  representing  a 
measure  of  homology  and  i  representing  a  measure  of  the  link’s  informative  attribute  release 
sequences,  much  as  for  the  medals  relation  M  of  Section  11.3.  The  resulting  scatterplot  appears 
in  Figure  36.  Again  homology  acts  as  a  lower  bound  for  informative  attribute  release  sequences. 
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Figure  36:  Scatterplot  describing  the  links  computed  for  767  of  the  musicians  in  the  Jazz 
relation  J.  The  scatterplot  shows  for  each  link  a  point  ( h,i ),  with  h  a  measure  of  the  link’s 
homology  (after  removal  of  attribute  cone  apexes)  and  i  a  measure  of  the  link’s  informative 
attribute  release  sequences. 

(The  colors  and  radii  indicate  the  numbers  of  musicians  in  the  links.  Link  sizes  are  fairly  small. 
The  color  ordering  and  size  boundaries  are: 

BLACK— 5— SILVER— 10— ORANGE— 20— GREEN— 50— BLUE— ioo— MAGENTA— 200— RED. 

In  this  figure,  the  buckets  may  hold  noticeably  varying  numbers  of  links.) 
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12  Inference  in  Sequence  Lattices 

We  have  seen  how  a  relation  gives  rise  to  a  lattice  via  the  Galois  connection,  as  per  Definition  12 
on  page  38.  The  lattice  structure  describes  the  ways  in  which  privacy  may  be  preserved  or 
lost.  Consequently,  when  thinking  about  privacy,  one  may  be  able  to  start  with  a  lattice  that 
does  not  necessarily  arise  directly  from  a  relation. 

This  section  will  look  at  inferences  from  sequences  of  observations.  The  next  section 
examines  strategy  obfuscation  in  planning  with  uncertainty. 

We  should  also  recall  some  equivalences.  Lattices  are  particular  kinds  of  partially  ordered 
sets  (posets).  Posets  and  simplicial  complexes  are  topologically  identical  [18].  One  can 
move  back  and  forth  between  these  representations  while  preserving  homeomorphism  type 
(see  Appendix  A).  Furthermore,  one  may  describe  a  simplicial  complex  by  a  relation  in 
several  different  ways  that  preserve  homotopy  type,  including  ways  in  which  one  of  the  two 
resulting  Dowker  complexes  is  identical  to  the  original  simplicial  complex.  In  short,  one  has 
three  different  categories  of  structures  with  which  to  think  about  privacy:  relations,  simplicial 
complexes,  and  lattices.  One  may  start  with  any  one  representation  and  build  the  other  two 
from  that. 

12.1  Sequence  Lattices  for  Dynamic  Attribute  Observations 


1: 


in 

TO 

D 

~a 


a  b 

b  a 


3: 


ti  h 

time 


Figure  37:  Three  types  of  individuals  and  the  attributes  each  might  reveal  in  two  successive 
time  intervals. 


Consider  the  dynamic  process  of  Figure  37.  The  process  models  observations  of  individuals 
who  reveal  attributes  over  successive  time  steps.  There  are  three  possible  individuals  (or  more 
generally,  types  of  individuals).  The  first  individual  emits  attributes  “a”  and  “b”  alternatingly 
at  successive  times,  but  one  does  not  know  which  of  those  attributes  one  might  see  first.  The 
second  individual  always  emits  the  same  attribute,  either  “a”  or  “b”,  but  one  does  not  know 
a  priori  which  it  is.  The  third  individual  always  emits  the  same  attribute  “c”. 

A  relation  for  these  (types  of)  individuals  that  models  the  individuals  in  terms  of  single 
attributes  appears  as  relation  S  in  Figure  38.  Individual  #3  is  distinguishable  from  the  other 
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s 

a  b  c 

1 

•  • 

2 

•  • 

3 

• 

T 

aa  bb  ab  ba  cc 

l 

•  • 

2 

•  • 

3 

• 

Figure  38:  Relation  S  describes  individuals  and  single  attributes,  while  T  describes  individuals 
and  sequences  of  two  attributes. 


two  individuals,  but  the  relation  provides  no  means  for  distinguishing  those  two  individuals 
from  each  other.  The  relation  is  homogeneous  with  regard  to  single  attributes  for  individuals 
#1  and  #2.  Of  course,  we  can  see  from  the  dynamic  process  of  Figure  37,  that  distinguishing 
information  appears  via  sequences  of  two  attributes.  Relation  T  of  Figure  38  models  such 
sequences.  Now  all  three  individuals  are  uniquely  identifiable.  Should  one  wish  to  model 
inferences  based  on  both  one  and  two  observations,  one  could  use  the  relation  S  LIT. 


A 


({1,2},  a)  «3},c)  ({1,2},  b) 


Figure  39:  Lattice  representing  the  dynamic  process  of  Figure  37. 


That  jump  from  single  to  double  attributes  is  useful,  but  where  does  it  come  from 
intrinsically?  After  all,  without  additional  knowledge,  we  might  simply  consider  infinitely  long 
sequences,  even  though  those  would  not  add  anything  in  this  example.  In  fact,  the  dynamic 
process  of  Figure  37  gives  us  the  information.  It  is  itself  basically  a  decision  tree  that  amounts 
to  the  lattice  of  Figure  39.  In  that  figure,  we  have  annotated  each  internal  node  of  the  lattice 
with  a  pair,  consisting  of  a  set  of  individuals  and  either  a  single  attribute  or  a  sequence  of  two 
attributes.  This  lattice  differs  from  previous  ones  in  this  report  in  that  a  set  of  individuals 
(or  attributes  more  generally)  is  no  longer  constrained  to  appear  in  at  most  one  node  of  the 
lattice.  By  allowing  multiple  nodes,  we  enhance  our  ability  to  encode  state  in  the  lattice. 
For  example,  observing  attribute  “a”  carries  different  meaning  depending  on  whether  one  has 
already  seen  attribute  “a”  or  attribute  “b”  or  no  attribute  at  all.  Also:  While  we  could  have 
included  ({3},  cc)  in  the  lattice,  we  did  not  need  that  element. 

In  the  lattice  of  Figure  39  it  is  tempting  to  merge  the  two  identifying  nodes  for  individual 
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# 1  into  one  node  and  to  merge  the  two  identifying  nodes  for  individual  #2  into  one  node. 
There  is  apparently  no  harm  in  doing  so,  in  that  the  decision  process  would  still  be  correct. 
However,  the  resulting  structure  would  no  longer  be  a  lattice  but  merely  a  poset.  That  may 
or  may  not  be  desirable  in  a  given  application.  For  instance,  using  homology  to  estimate  how 
long  one  can  delay  identification  requires  almost  a  join-based  lattice  to  fulfill  the  hypotheses 
of  Theorem  25  on  page  47. 

If  we  did  want  to  merge  nodes  as  just  described,  while  maintaining  a  lattice,  then  we  would 
perhaps  also  merge  the  two  nodes  containing  the  set  {1,  2},  giving  us  the  lattice  of  Figure  40. 
This  lattice  is  similar  to  the  lattice  Pgjy  that  one  would  construct  from  the  relation  S  LIT, 
except  that  it  does  not  include  singleton  attributes  in  the  nodes  identifying  individuals  #1  and 
#2  and  it  does  not  include  the  sequence  “cc”  in  the  node  identifying  individual  #3. 


Figure  40:  Modified  lattice  of  Figure  39,  after  merging  some  nodes. 


Regardless,  the  lattices  of  Figures  39  and  40  encode  the  inferences  possible  from  the  decision 
process  of  Figure  37.  In  particular,  if  we  observe  either  attribute  “a”  or  attribute  “b”,  then  we 
know  the  set  of  possible  individuals  is  {1,2};  we  have  excluded  individual  #3.  Moreover,  if  we 
observe  any  two-attribute  sequence,  with  attributes  drawn  from  {a,  b},  then  we  can  identify 
the  observed  individual  fully  as  either  #1  or  #2.  Thus  the  required  sequences  come  directly 
from  the  decision  process,  not  requiring  an  intermediate  representation  as  a  relation. 


12.2  Lattices  of  Stochastic  Observations 

The  dynamic  sequence  perspective  incorporates  randomized  response  within  the  lattice 
framework.  Instead  of  arising  via  a  deterministic  process  as  in  Figure  37,  the  attributes  “a” 
and  “b”  could  flow  from  a  stochastic  process.  One  obtains  an  infinite  lattice  determined  by 
increasingly  longer  sequences  of  observations.  Depending  on  the  confidence  intervals  one  wishes 
to  set,  one  obtains  decision  regions  such  as  those  sketched  in  Figure  41,  with  a  central  region 
of  ambiguity,  bounded  by  regions  of  exclusion,  for  identifying  individuals. 
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Figure  41:  Sketch  of  an  inference  lattice  for  sequences  of  randomized  response  queries. 

12.3  General  Inference  Lattices 

Lattices  are  useful  tools  for  inference.  Rather  than  work  with  completely  arbitrary  lattices, 
we  give  here  a  definition  that  makes  explicit  the  existence  of  two  underlying  structures  over 
which  we  wish  to  perform  inferences.  However,  we  no  longer  assume  a  pair  of  underlying 
discrete  spaces  X  and  Y  for  individuals  and  attributes,  but  instead  posit  posets  P  and  Q.  The 
connection  to  our  earlier  relational  perspective  is  that  P  would  be  the  powerset  of  X  and  Q 
the  powerset  of  Y .  By  allowing  potentially  different  posets  P  and  Q  for  a  given  lattice  L,  one 
can  in  some  instances  obtain  different  “views”  of  that  lattice,  thereby  increasing  flexibility  in 
the  interpretation  process.  For  instance,  Q  might  consist  of  all  sequences  up  to  a  specified 
length  or  it  might  consist  of  sets  of  such  sequences. 

Definition  28  (Inference  Lattice).  Let  P  and  Q  be  finite  posets. 

An  inference  lattice  L  with  respect  to  P  and  Q  is  a  bounded  lattice  whose  proper  part  L 
consists  of  pairs  (p,  q ),  with  p  G  P  and  q  G  Q,  satisfying  the  following  conditions: 

For  all  (pi,  qi)  and  ( P2,q2 )  in  L: 

(i)  (pi,qi)  <L  (P2,Q2)  0  and  only  if  p1  <P  p2  and  q\  >q  q2; 

(ii)  (pi ,qi)y l  (p'2 ■  qfi)  is  either  1l  or  a  pair  (p,  q)  G  L  such  that  p  is  an  upper  bound  for  both 
pi  and  p2  in  P  and  q  is  a  lower  bound  for  both  q\  and  q2  in  Q; 

(in)  (pi ,  q\ )  A l  (P2,  Q2)  is  either  0 l  or  a  pair  (p,  q)  G  L  such  that  p  is  a  lower  bound  for  both 
pi  and  p2  in  P  and  q  is  an  upper  bound  for  both  qi  and  q2  in  Q. 

(Note  that  0P  <p  ( p,q )  <l  1  l  for  every  ( p,q )  G  L,  given  that  L  is  the  proper  part  of  L.  Also 
be  aware  that  L  need  not,  and  generally  will  not,  contain  all  possible  pairs  (p,q).) 

Inference  Protocol:  Suppose  we  have  observed  some  q  G  Q.  How  should  we  interpret 
that  observation  in  terms  of  the  lattice  L?  Here  is  a  possible  protocol:  (In  terms  of  our 
earlier  relational  model,  one  may  view  this  protocol  as  inferring  sets  of  individuals  from  sets 
of  attributes.) 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


62 


General  Inference  Lattices 


A 


Figure  42:  Poset  P  models  some  sets  of  individuals;  poset  Q  models  some  sequences  of 
attributes. 


•  Let  T  =  {(p',q')  G  L  \  q  <Q  q'}. 

•  If  r  =  0,  then  we  view  q  as  inconsistent,  implying  interpretation  0^  G  L. 

•  Otherwise,  let  rmax  consist  of  all  the  maximal  elements  of  T  (maximal  with  respect  to 
the  partial  order  on  L).  We  view  q  as  implying  this  set  of  elements  in  L.  One  can  project 
each  of  those  elements  onto  its  P  coordinate,  if  that  is  useful. 

There  is  a  dual  protocol  for  interpreting  an  observation  p  G  P: 

(In  terms  of  our  earlier  relational  model,  one  may  view  this  protocol  as  inferring  sets  of 
attributes  from  sets  of  individuals.) 

•  Let  E  =  {{p',q')  G  L  \  p  <P  p'}. 

•  If  E  =  0,  then  we  view  p  as  inconsistent,  implying  interpretation  1  j,Gl. 

•  Otherwise,  let  Emjn  consist  of  all  the  minimal  elements  of  E  (minimal  with  respect  to 
the  partial  order  on  L ).  We  view  p  as  implying  this  set  of  elements  in  L.  Again,  one  can 
project  each  of  those  elements  onto  its  Q  coordinate,  if  that  is  useful. 

In  our  previous  relational  setting,  the  structure  of  Galois  lattices  ensured  that  each  of  rmax 
and  Em;n  never  contained  more  than  one  element.  That  need  not  be  true  for  general  inference 
lattices. 


Example:  Suppose  P  and  Q  are  as  in  Figure  42.  Here  P  models  subsets  drawn  from  the 

set  of  two  individuals  {1,2},  while  Q  models  sequential  observations  of  “a”  and  “b”,  of  lengths 
one  and  two,  as  in  our  earlier  example  of  Figure  37.  The  lattice  L  is  as  in  Figure  39.  For 
presentational  simplicity,  posets  P  and  Q  ignore  individual  #3  and  attribute  “c”,  instead 
focusing  on  individuals  {1,2}  and  attributes  {a,  b}. 
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Observing  an  attribute:  Suppose  we  have  observed  attribute  “b”,  i.e.,  q  =  b.  What  can 
we  infer  from  q\nP  via  LI  Let’s  follow  the  protocol  given  above: 


•  The  subposet  of  Q  consisting  of  elements  q'  greater  than  or  equal  to  q  is: 


i 

/  \ 

ba  bb 

\  / 

b 


•  Consequently,  T  is  the  following  subposet  of  L: 


({1)2}, b) 


({1} )  ba) 


({2}, bb) 


•  There  is  one  maximal  element  in  T,  so  Tmax  =  {({l,2},b)|. 

Projecting  onto  the  P  component  tells  us  how  to  interpret  q:  The  observation  “b”  must 
have  come  from  either  individual  ^1  or  individual  #2.  (This  conclusion  would  hold  as  well  if 
P  had  modeled  individual  #3  and  if  Q  had  modeled  attribute  “c”.) 

Observing  an  individual:  Suppose  we  have  observed  individual  #1,  i.e.,  p  =  {1}.  What 
can  we  infer  from  p  in  Q  via  LI  Again,  let’s  follow  the  inference  protocol  given  earlier: 

,  {1,2} 

•  The  subposet  of  P  consisting  of  elements  p  greater  than  or  equal  to  p  is:  I 

{1} 


•  Consequently,  £  is  the  following  subposet  of  L: 


({1,2}, a) 

I 

({1} ,  ab) 


({1,2}, b) 

({1} ,  ba) 


•  The  minimal  elements  of  £  give  us  £min  =  {({1},  ab),  ({l},ba)}. 

Projecting  onto  the  Q  component  tells  us  how  to  interpret  p:  The  individual  observed  can 
or  did  reveal  one  of  the  two-attribute  sequences  “ab”  or  “ba” . 


Comment:  The  poset  Q  of  Figure  42  would  not  be  very  useful  for  inferences  in  the  lattice 

of  Figure  40,  since  that  lattice  now  models  attribute  observations  as  sets  of  sequences  rather 
than  merely  as  sequences.  We  would  instead  probably  want  Q  to  be  something  like  the 
poset  of  Figure  43.  So  even  though  L  has  become  simpler  than  in  Figure  39,  Q  has  become 
more  complicated.  On  the  other  hand,  the  new  (L,  P,  Q )  triple  means  that  one  can  infer 
({1, 2},  {a,  b})  from  the  observation  “b”.  As  before,  that  says  the  observation  “b”  must  have 
come  from  individual  # 1  or  #2,  but  it  also  says  that  the  individual  could  alternatively  have 
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Figure  43:  Poset  Q  modeling  sets  of  attribute  sequences,  for  inferences  in  the  lattice  of 
Figure  40. 

produced  attribute  “a”.  In  summary,  by  altering  the  triple  (L,P,Q),  one  changes  the  possible 
inferences. 

The  poset  Q  of  Figure  43  is  a  conveniently  chosen  finite  subposet  of  a  particular  infinite 
poset  modeling  sets  of  sequences.  In  that  model,  each  set  is  required  to  be  finite  and  prefix-free , 
meaning  that  if  two  distinct  sequences  appear  in  an  element  of  Q,  neither  may  be  a  prefix  of 
the  other.  The  partial  order  on  Q  is  defined  by:  q\  <q  r/2  precisely  when  every  sequence  in  q\ 
is  a  prefix  of  (possibly  equal  to)  some  sequence  in  q2 ■ 
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13  Lattices  for  Strategy  Obfuscation 

We  have  seen  sublattices  of  powerset  lattices,  those  being  prototypical  examples  of  Boolean 
lattices.  A  related  example  is  given  by  strategy  complexes  [6,  7],  which  may  be  viewed  as 
lattices  of  partial  orders  formed  from  potentially  stochastic  or  nondeterministic  transitions  in 
a  graph.  The  basic  elements  in  such  a  lattice  are  strategies  for  attaining  various  goals.  Our 
work  on  privacy  now  raises  the  question  of  strategy  obfuscation:  How  can  someone  reveal  the 
actions  of  a  strategy  in  a  fashion  that  delays  identification  of  the  strategy? 

13.1  Strategies  for  Nondeterministic  Graphs 


Figure  44:  A  graph  G  with  three  states,  four  deterministic  actions,  and  one  nondeterministic 
action  (03). 

For  a  very  simple  example,  consider  the  graph  of  Figure  44.  We  think  of  this  graph  as 
modeling  some  kind  of  dynamic  system,  for  instance,  a  person  driving  between  three  shopping 
malls  or  a  robot  moving  among  clutter  in  a  warehouse  or  an  intruder  in  a  server  network. 

There  are  three  states  in  the  graph,  along  with  five  actions.  Each  action  has  a  source  state 
and  one  or  more  target  states.  An  action  may  be  executed  when  the  system  is  at  the  source 
of  the  action,  causing  the  system  to  move  from  the  action’s  source  to  one  its  target  states. 

Four  of  the  actions,  {01,02,04,05},  are  standard  deterministic  directed  edges,  leading 
for  certain  from  one  state  to  another.  The  remaining  action,  03,  is  nondeterministic. 
Nondeterminism  of  03  means  that  if  the  system  is  at  state  3  and  executes  action  03,  then 
the  precise  outcome  is  uncertain:  The  system  might  move  either  to  state  1  or  to  state  2. 
Nondeterminism  is  potentially  adversarial:  The  precise  target  state  attained  is  unpredictable 
and  could  vary  nonstochastically  on  different  executions  of  the  action.  One  may  generalize  this 
idea  to  include  stochastic  actions  along  with  deterministic  and  nondeterministic  actions,  thus 
modeling  adversarial  combinations  of  Markov  chains.  We  will  not  do  so  here,  but  see  [6,  7]. 

For  our  purposes  here,  a  strategy  is  a  set  of  actions  whose  underlying  edge  set  contains  no 
cycles.  If  the  system  is  at  a  state  which  is  the  source  of  an  action  in  the  strategy,  then  the 
system  executes  that  action.  If  the  strategy  contains  multiple  actions  with  that  same  source 
state,  then  the  actual  action  executed  is  determined  nondeterministically.  For  instance,  in  the 
example,  if  actions  ai  and  05  both  appear  in  a  strategy  then  the  strategy  is  agnostic  as  to 
whether  the  system  will  transition  to  state  2  or  state  3  from  state  1.  If  a  strategy  does  not 
contain  an  action  for  a  given  state,  then  the  system  will  stop  moving  if  it  is  ever  in  that  state. 
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The  lattice  operations  for  strategies  are  set  union  and  set  intersection,  with  one  proviso: 
Suppose  (j i  and  02  are  two  strategies.  Each  strategy  is  a  set  of  actions  with  no  cycles  in  its 
underlying  edge  set.  If  the  union  of  two  strategies  o\  U  02  contains  an  underlying  cycle  in  its 
edge  set,  then  the  lattice  operation  becomes  cr\  V  02  =  1,  with  1  the  top  element  of  the  lattice. 
That  top  element  represents  cyclicity.  The  bottom  element  0  of  the  lattice  is  equivalent  to  the 
empty  strategy  0,  amounting  to  no  motion. 


Figure  45:  The  strategy  complex  for  the  graph  of  Figure  44.  We  have  labeled  each  maximal 
simplex  with  an  identifier,  for  the  purposes  of  Figure  46. 

Rather  than  draw  a  lattice  of  strategies  L,  it  is  more  convenient  to  draw  an  equivalent 
simplicial  complex  whose  vertices  are  the  (acyclic)  actions  21  of  the  graph.  This  simplicial 
complex  is  denoted  by  A q  and  is  called  the  strategy  complex  of  G.  The  connection  is  that  the 
proper  part  of  the  lattice  is  the  face  poset  of  the  simplicial  complex,  that  is  L\{0, 1}  =  $(Aq). 
Figure  45  shows  the  strategy  complex  for  the  graph  of  Figure  44.  The  constituent  simplices  of 
the  strategy  complex  are  strategies,  that  is,  all  sets  of  actions  whose  underlying  edge  sets  are 
acyclic. 
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Figure  46:  Relation  A  describes  the  strategy  complex  of  Figure  45  in  terms  of  its  maximal 
simplices  and  their  constituent  actions.  The  rightmost  column  shows  each  maximal  strategy’s 
goal,  i.e.,  that  state  at  which  motion  ceases. 

Now  that  we  have  a  simplicial  complex,  we  can  form  a  relation,  whose  “individuals”  are 
all  maximal  strategies  of  the  complex  and  whose  “attributes”  are  the  underlying  actions,  as 
shown  in  Figure  46.  The  figure  also  shows  each  maximal  strategy’s  goal ,  that  is,  the  state  at 
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which  the  strategy  would  stop  moving.  (In  general,  a  strategy,  even  a  maximal  strategy,  may 
have  a  multi-state  goal  set,  but  in  this  example  the  goals  of  all  maximal  strategies  are  singleton 
states.)  We  make  the  following  observations: 

•  There  is  at  least  one  strategy  for  attaining  each  state  in  the  graph,  meaning  it  is  possible 
to  move  from  every  state  to  every  other  state,  despite  uncertainty  in  the  outcome  of  one 
of  the  actions.  Such  graphs  are  called  fully  controllable  in  [6,  7],  and  have  properties 
similar  to  those  of  strongly  connected  directed  graphs. 

•  For  each  maximal  strategy,  there  are  two  informative  attribute  release  sequences,  each 
consisting  of  two  actions.  For  instance,  for  <72  one  could  reveal  actions  03  and  04  in  either 
order,  identifying  02  only  after  revealing  both  actions.  For  <73,  one  could  reveal  actions 
ai  and  04  in  either  order,  now  identifying  <73  only  after  revealing  both  actions. 

•  Some  actions  reveal  the  goal  even  though  they  do  not  identify  the  maximal  strategy.  In 
particular,  actions  04  and  02  each  individually  reveal  the  goal  to  be  3.  (The  two  actions 
are  in  fact  equivalent  in  A,  in  that  either  one  implies  the  other.)  For  instance,  if  one 
knows  that  <24  is  in  a  maximal  strategy  <7,  then  one  knows  that  the  strategy  cannot  also 
contain  <23,  as  adding  <23  would  create  a  cycle  in  the  underlying  edge  set.  Action  <22  must 
therefore  also  be  in  the  strategy,  since  the  strategy  is  maximal.  Consequently,  the  goal 
is  state  3  and  <7  is  either  <73  or  <74.  The  difference  between  these  two  maximal  strategies 
is  a  choice  between  <24  and  <25.  That  choice  does  not  affect  the  final  goal,  but  could  affect 
intermediate  motions  and  the  time  to  reach  the  goal.  A  rough  analogy  is  knowing  that 
a  car  on  a  freeway  must  continue  on  the  freeway  until  at  least  the  next  exit  but  has  a 
choice  between  lanes  enroute. 

•  Each  strategy  has  at  least  one  informative  attribute  release  sequence,  consisting  of  two 
actions,  that  does  not  reveal  the  goal  until  the  final  action  has  been  released.  For 
instance,  for  <73,  one  could  first  release  04,  leaving  open  the  possibility  of  either  state  1 
or  state  3  being  the  goal,  then  subsequently  release  either  <24  or  02- 

Question:  Is  this  set  of  intertwined  observations  fundamental? 

Answer:  Yes,  with  certain  qualifications,  described  next. 

13.2  Connecting  the  Topologies  of  Strategy  Complexes  and  Privacy 
Notation: 

•  G  =  (V,  21)  denotes  a  graph  with  underlying  states  V  and  possibly  uncertain  actions  21. 
(For  simplicity,  we  assume  here  that  both  V  and  21  are  not  empty.) 

•  Ac  denotes  the  strategy  complex  of  G;  it  includes  the  empty  strategy  0. 

Lemma  29.  Let  G  =  (V,  21)  be  a  graph  as  above  and  SOI  the  set  of  maximal  simplices  of  Ac- 

Define  relation  A  on  VJl  x  21  by  A  =  {(<7,  a)  \  a  6  a  €  SOT}.  Then  <I>a  =  A <3.  In  other 
words,  the  Dowker  complex  over  the  set  of  actions  is  the  same  as  the  graph ’s  strategy  complex. 
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(The  lemma  holds  more  generally  for  simplicial  complexes.  The  proof  is  nearly  definitional.) 

(The  “A”  stands  for  “Action”.) 

One  of  the  fundamental  results  from  [6,  7]  is  that  a  graph  is  fully  controllable  if  and  only  if 
its  strategy  complex  is  homotopic  to  a  sphere  of  dimension  two  less  than  the  number  of  states 
in  the  graph:  (Recall  that  means  homotopy  equivalence.) 

Theorem  30.  A  graph  G  =  (V,  21)  is  fully  controllable  if  and  only  if  Ac  —  Sn~2,  with  n  =  \V\. 

Now  recall  our  fundamental  privacy  result,  Corollary  26  from  page  48.  That  corollary, 
along  with  Theorem  30,  tells  us  that  if  a  graph  G  =  ( V. ,  21)  is  fully  controllable,  then  the  poset 
Pa  formed  from  the  relation  A  constructed  as  above  must  contain  at  least  n\  maximal  chains, 
each  consisting  of  at  least  n  —  1  elements,  with  n  =  |V|  (recall  that  the  number  of  elements  in 
a  chain  is  one  more  than  its  length). 

We  actually  want  a  stronger  result,  speaking  to  individual  strategies  and  we  can  get  that  by 
looking  into  the  details  of  the  proof  of  Theorem  25.  The  proof  is  an  induction  that  recursively 
considers  links,  giving  us  the  following  (see  Appendices  G  and  H): 

Theorem  31  (Delaying  Strategy  Identification).  Let  G  =  (V,  21)  be  a  fully  controllable  graph, 
with  n  =  \V\  >  1.  Let  A  be  the  relation  constructed  as  in  Lemma  29  and  let  Pa  be  its  associated 
doubly-labeled  poset.  Then: 

For  each  v  €  V ,  there  exists  a  maximal  strategy  av  e  for  attaining  singleton  goal  state 
v  such  that  Pa  contains  at  least  (n  —  1)!  distinct  maximal  chains  for  identifying  ov,  with  each 
chain  consisting  of  at  least  n  —  1  elements. 

Clarifying  Observation:  Each  maximal  chain  for  identifying  av  specifies  at  least  n  —  1 
actions  and  an  order  for  releasing  them  such  that  no  action  is  implied  by  those  previously 
released.  In  particular,  the  sequence  of  actions  does  not  identify  av  until  all  actions  have  been 
released. 

Comments:  Theorem  31  does  not  assert  that  every  maximal  strategy  in  Aq  has  (n  —  1)! 
many  “long”  identifying  chains,  merely  that  for  every  possible  singleton  goal  v,  there  is  some 
strategy  for  attaining  v  with  (n  —  1)!  many  “long”  identifying  chains.  It  is  not  hard  to  construct 
an  example  for  which  some  maximal  strategy  with  a  singleton  goal  state  has  fewer  than  (n  —  1)! 
identifying  chains.  This  fact  suggests  further  questions.  Here  are  two: 

•  Given  an  arbitrary  maximal  strategy  av  for  attaining  a  singleton  goal  state  v,  can  we 
find  at  least  one  chain  in  Pa  that  identifies  av  but  requires  release  of  at  least  n  —  1 
actions  before  doing  so?  We  do  not  know  the  answer  in  general,  although  small  examples 
suggest  the  answer  is  “yes”.  We  do  have  a  proof  that  the  answer  is  “yes”  when  the 
graph  contains  a  Hamiltonian  cycle  consisting  of  edges  that  come  from  deterministic  or 
stochastic  actions. 

•  Given  a  singleton  goal  state  v,  can  we  find  at  least  one  maximal  strategy  ov  and  at 
least  one  chain  in  Pa  that  eventually  identifies  av,  but  does  not  reveal  the  goal  v  before 
releasing  at  least  n  —  1  actions?  The  answer  to  this  question  is  “yes” .  The  proof  operates 
by  repeatedly  creating  quotient  graphs.  In  forming  a  quotient  graph,  the  proof  regards 
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as  equivalent  a  certain  set  of  states  that  are  connected  by  a  cycle  of  edges,  with  each 
edge  coming  from  some  deterministic  or  stochastic  action.  For  instance,  in  the  graph  of 
Figure  44,  the  proof  would  regard  states  1  and  2  as  equivalent.  The  resulting  quotient 
graph  would  then  consist  of  two  states  with  deterministic  actions  between  them,  since 
action  03  becomes  a  deterministic  transition  in  the  quotient  graph.  Inductively,  one 
therefore  sees  that  an  entity  can  hide  its  true  goal  until  at  least  two  actions  in  the 
original  graph  G  have  been  revealed.  (See  Appendix  H  for  further  details.) 

A  comment/caution  regarding  the  availability  of  many  chains:  The  (n—  1)!  chains 
mentioned  above  may  come  from  all  possible  permutations  of  the  same  underling  set  of  n  —  1 
actions.  Alternatively,  these  (n  — 1)!  chains  may  involve  creative  sequencing  of  more  than  n  —  1 
actions.  The  precise  makeup  of  the  chains  depends  on  the  underlying  homology  generators. 
However,  even  if  the  chains  are  merely  reordering  the  same  n  —  1  actions,  there  is  good 
reason  to  take  advantage  of  that  capability,  rather  than  pick  one  particular  sequence  via  a 
deterministic  algorithm.  The  reason  is  that  knowledge  of  how  an  algorithm  releases  actions 
may  leak  information  to  an  adversary.  Such  leakage  may  be  understood  as  changing  the 
effective  relation.  For  instance,  despite  thinking  one  is  working  with  relation  A,  a  particular 
release  protocol  may  simply  be  focusing  on  some  proper  subset  of  A  or  some  proper  subset 
of  the  poset  Pa,  possibly  resulting  in  very  different  inference  characteristics.  A  good  release 
strategy  may  be  to  choose  randomly  from  among  the  (n  —  1)!  possible  chains.  In  that  way,  one 
is  taking  good  advantage  of  the  spherical  homogeneity  suggested  by  homology. 
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14  Relations  as  a  Category 

We  have  discussed  disinformation,  obfuscation,  and  other  manipulation  of  relations.  The  goal 
of  such  transformations  has  been  to  preserve  privacy  by  removing  free  faces.  We  have  not 
yet  discussed  such  transformations  formally.  For  instance,  the  coordinate  transformations  of 
Section  9  raise  the  question: 

How  should  one  think  about  maps  between  relations? 

14.1  Relationship-Preserving  Morphisms 

Traditionally,  relations  are  morphisms  between  sets  (with  functions  a  special  case) .  In  thinking 
about  privacy,  it  is  useful  to  define  a  category  in  which  the  relations  are  the  objects.  We 
have  some  choices  in  defining  morphisms  for  this  category.  Bearing  in  mind  our  Dowker 
constructions,  we  make  the  following  definitions. 

Notation:  We  frequently  will  be  working  with  two  relations:  R  is  a  relation  on  XR  x  YR 

and  Q  is  a  relation  on  X Q  x  Y®  (the  superscripts  are  just  indices  to  indicate  the  underlying 
relation).  In  order  to  distinguish  rows  and  columns  between  the  two,  we  will  also  use  notation 
of  the  form  XR,  YR,  Xy  ,  and  Y$ . 

Definition  32  (Morphism).  Let  R  be  a  relation  on  XR  x  YR  and  let  Q  be  a  relation  on 
XQ  x  Y® .  A  morphism  of  relations  /  :  R  — ►  Q  is  a  pair  of  set  functions: 

fx  :  XR^X® 

fy  :  YR^YQ 

such  that  (fx(x),  fy(y))  £  Q  whenever  (x,y)  £  R. 

In  other  words,  a  morphism  of  relations  maps  individuals  to  individuals  and  attributes  to 
attributes  in  a  way  that  preserves  relationships. 

The  following  lemma  follows  from  the  definitions  (a  proof  appears  in  Appendix  I): 

Lemma  33  (Induced  Simplicial  Maps).  A  morphism  f  :  R  Q  between  nonvoid  relations 
induces  simplicial  maps  between  the  Dowker  complexes: 

fx  ■  R  *  'Fq 

fy  ■ 

Notational  comment:  The  symbols  fx  and  fy  are  overloaded  intentionally.  The 

simplicial  map  fx  is  precisely  the  set  map  fx  applied  to  the  vertices  of  any  simplex:  If 
o  =  {x0,  ...,xk}e  'Fr,  then  fx(cr)  =  {fx{x  o),  •  •  • ,  fx(xk)}  £  'Fq.  Similarly  for  fy. 

Intuitively,  one  cannot  partition  the  individuals  of  a  connected  relation  into  two  or  more 
classes  without  misclassifying  or  ignoring  at  least  some  relationships.  A  graph  connectivity 
argument  provides  a  possible  proof.  Lemma  33  provides  another,  with  additional  insight.  Let’s 
look  at  some  examples: 
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Two  Bits  onto  One:  Consider  again  the  relations  S  and  Q  of  Figures  15  and  16,  respectively, 
on  page  30.  Relation  S  models  a  one-bit  relation  —  an  attribute  and  its  negation.  Relation 
Q  models  a  two-bit  relation  —  two  attributes  and  their  negations.  The  Dowker  complexes  for 
S  have  8°  homotopy  type,  while  those  for  Q  have  S1  homotopy  type.  We  can  think  of  S  as 
a  classification,  splitting  individuals  into  those  that  have  some  attribute  a  and  those  that  do 
not. 

By  Lemma  33,  a  morphism  /  :  Q  — >  S  induces  simplicial  (hence  continuous)  maps  between 
the  corresponding  Dowker  complexes  of  S  and  Q.  Since  81  is  connected  but  8°  is  not,  there 
is  no  surjective  continuous  function  from  S1  to  8°.  Consequently,  no  morphism  f  \  Q  S 
can  truly  be  a  classification:  fy  can  map  all  four  attributes  {a,  ->a,  b,  -^b}  of  Q  to  the  single 
attribute  a  or  all  four  attributes  to  ->a,  but  fy  cannot  map  to  both  a  and  ->a. 


Q' 
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• 
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• 

{1,2}  {3,4}  2  4 


Figure  47: 


Relation  Q'  obtained  from  relation  Q  of  Fig.  16  by  discarding  attributes  b  and  ->b. 


This  impossibility  may  at  first  seem  paradoxical.  After  all,  one  can  simply  cut  relation 
Q  down  the  middle  and  throw  away  the  columns  involving  attributes  b  and  — ib,  as  shown 
in  Figure  47.  After  that,  a  surjective  morphism  f  :  Q'  — >  S  is  immediate.  Indeed,  that  is 
possible.  However,  in  so  doing,  one  has  discarded  some  relationships,  perhaps  purposefully, 
perhaps  accidentally.  In  particular,  the  relationship  between  individuals  #1  and  3  of  Q  via 
attribute  b  is  lost,  as  is  the  relationship  between  individuals  #2  and  #4  via  attribute  ->b. 
This  reasoning  simply  underscores  the  fact  that  morphisms  of  relations  preserve  relationships. 
Lack  of  continuity  in  a  function  therefore  is  a  sign  that  one  is  discarding  some  relationships. 
Whether  such  discard  is  desirable  depends  on  one’s  goals  in  a  particular  application. 


Three  Bits  onto  Two:  Recall  as  well  Figure  17,  which  depicts  a  three-bit  relation  R  — 
three  attributes  and  their  negations,  capable  of  distinguishing  between  eight  individuals.  The 
homotopy  type  of  the  Dowker  complexes  is  S2.  With  Q  as  above,  the  following  question  arises 
naturally  when  trying  to  reduce  complexity  of  data  yet  preserve  information: 

Does  there  exist  a  surjective  morphism  /  :  R  — >■  Q  ? 

Unlike  the  previous  example,  there  do  exist  continuous  maps  from  S2  onto  S1,  so  perhaps 
one  can  find  a  surjective  morphism  /  :  R  — >  Q.  In  fact,  one  cannot,  for  dimensional  reasons 
that  force  an  equator  of  S2  to  become  a  homology  generator  of  S1.  A  simplex-based  argument 
goes  as  follows: 

•  Suppose  surjective  /  :  R  — >  Q  exists.  As  will  be  discussed  later  (see  page  74),  this  means 
the  component  functions  fx  ■  ’L r  \k q  and  fy  :  — ►  4>q  are  surjective  as  set  maps. 
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•  One  may  therefore  assume  without  loss  of  generality  that  /y(a)  =  a  and  /y( b)  =  b. 

•  The  triangles  {a,  b,c}  and  {a,  b,-ic}  are  both  simplices  in  <L/j.  The  maximal  simplices 
of  Tq  are  edges. 

•  By  Lemma  33,  this  means  that  /y(c)  and  /y(— >c)  are  both  elements  of  {a,  b}  in  4>q. 

•  Again  by  surjectivity,  we  therefore  see  that  {fy(~ >a) ,  /y(-ib)}  =  { — >a,  — <b}. 

•  Another  triangle-versus-edge  argument  then  says  that  /y(c)  and  /y(— >c)  are  both 
elements  of  { — >a,  ->b},  giving  us  a  contradiction. 


Of  course,  as  in  constructing  Q'  of  Figure  47,  if  we  are  willing  to  tolerate  discontinuities,  we 
could  discard  one  attribute  and  its  negation  to  obtain  Q  from  R.  As  before,  discontinuity  means 
losing  awareness  of  some  relationship(s).  For  instance,  if  we  omit  attribute  c,  we  would  become 
unaware  in  Q  of  the  relationship  that  exists  in  R  among  the  set  of  individuals  {1, 3, 5,  7}. 


14.2  Privacy-Establishing  Morphisms 
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Figure  48:  Relation  M  is  isomorphic  to  relation  G  of  Figure  23  on  page  37,  now  without  the 
author-book  semantics.  The  Dowker  complexes  are  dual  triangulations  of  the  Mobius  strip, 
with  S1  homotopy  type. 


Relations  involve  two  spaces.  Looking  at  just  or  just  may  hide  some  interesting 
properties.  For  instance,  consider  the  Mobius  strip  relation  M  of  Figure  48.  We  encountered 
this  relation  previously,  in  Section  10. 

We  might  want  to  try  to  remove  some  of  the  inferences  discussed  in  Section  10  by  reshaping 
the  underlying  relation  without  discarding  any  relationships.  Doing  so  leads  to  the  following 
question: 


Does  there  exist  a  surjective  morphism  /  :  M  — >  T,  with  T  a 
relation  that  preserves  attribute  and  association  privacy  ? 

In  Section  8,  we  mentioned  that  any  such  T  must  have  the  topology  of  either  a  linear  cycle 
or  a  spherical  boundary  complex.  It  turns  out  that  the  answer  to  this  question  is  “yes”  and 
that  the  relevant  T  creates  Dowker  complexes  that  are  boundaries  of  tetrahedra  (see  Figure  30 
on  page  45). 
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This  construction  is  not  immediately  obvious  from  the  complexes  and  4/  m  ■  Although 

those  simplicial  complexes  are  2-dimensional,  suggesting  that  their  triangles  can  be  wrapped 
around  a  tetrahedron,  doing  so  actually  collapses  two  of  the  five  triangles  to  edges.  Indeed, 
the  component  functions  for  one  such  surjective  morphism  /  :  M  — ►  T  are: 


fx  ■  XM  -  XT 

1  4 

2  1 

3  2 

4  i— >  3 

5  1-^  4 


fy  :  YM  ->  Yt 
a  i— >  a 
b  i— >  b 
c  i — >  c 
d  >—>  d 
e  i— ►  a 


The  induced  simplicial  maps  act  on  the  five  maximal  simplices  of  4/ m  and  4>m  as  follows: 


fx  ■  'I'm 

4 !  j- 

fy  : 

<3?m 

-»■ 

{1,2,3} 

1 — > 

{1,2,4} 

{a, 

b,  c} 

1 — > 

{a,  b,  c} 

{2,3,4} 

1 — >• 

{1,2,3} 

0>, 

c,d} 

1 — > 

{b,c,d} 

{3,4,5} 

1 — > 

{2,3,4} 

{<=, 

d,  e} 

1 — > 

{a,  c,  d} 

{1,4,5} 

1 — >• 

{3,4} 

{a, 

d,  e} 

1 - >• 

{a,  d} 

{1,2,5} 

1 — >• 

{1,4} 

{a, 

b,e} 

1 - >• 

{a,  b} 

Even  though  fx  and  fy  are  surjective  as  set  maps  on  the  vertices  of  the  Dowker  complexes, 
they  are  not  surjective  as  simplicial  maps  on  the  complexes  themselves.  Each  only  covers  3 
of  the  4  triangles  comprising  its  image  tetrahedron.  At  first  glance  it  may  therefore  seem 
that  the  morphism  /  :  M  — *  T  resulting  from  fx  and  fy  does  not  achieve  the  desired 
privacy  preservation.  A  closer  look,  however,  reveals  that  /  is  actually  surjective  as  a  map  of 
relations:  it  maps  all  the  elements  of  M  onto  the  elements  of  T.  Therefore,  it  does  represent  a 
transformation  that  achieves  privacy  preservation. 

In  order  to  understand  this  paradox,  imagine  again  that  M  represents  an  authorship 
database.  Think  of  the  maps  fx  and  fy  as  quotient  maps,  in  this  case  equating  authors 
1  and  5  and  books  a  and  e.  The  equivalencing  of  authors  might  constitute  a  recognition  of 
pseudonyms.  The  equivalencing  of  books  might  represent  a  generalization  from  titles  to  genres. 
Such  changes  of  resolution,  carefully  chosen,  perhaps  based  on  external  structure,  can  preserve 
relationships  while  reducing  recognition  and  inference  granularity. 

14.3  Summary  of  Morphism  Properties 

Definition  32  defines  a  morphism  of  relations  /  :  R  — >  Q  in  terms  of  underlying  set  functions 
fx  :  XR  — >  X Q  and  fy  :  YR  — ►  Y®.  These  set  functions  further  induce  simplicial  maps 
fx'-'&R—*'&Q  and  fy  :  — >  4>q.  The  previous  subsections  spoke  of  surjectivity  in  varying 

contexts.  Similarly,  one  could  speak  of  maps  as  being  one-to-one  in  varying  contexts.  Finally, 
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one  also  speaks  of  morphisms  as  being  epimorphisms  and  monomorphisms.  This  subsection 
summarizes  how  these  properties  relate  for  the  various  maps.  See  Appendix  I  for  proofs. 

First,  some  definitional  context  and  reminders: 

•  Suppose  /  :  R  — >  Q  is  a  morphism  of  relations.  Recall  from  category  theory  that  /  is  an 
epimorphism  if,  for  any  pair  of  morphisms  g,h  :  Q  — >  S,  g  o  /  =  h  o  f  implies  g  =  h. 

Recall  further  that  a  morphism  /  :  R  — >  Q  is  a  monomorphism  if,  for  any  pair  of 
morphisms  g,h  :  S  ^  R,  f  o  g  =  f  o  h  implies  g  =  h. 

•  A  morphism  of  relations  /  :  R  — ►  Q  is  also  a  set  map  between  the  set  of  pairs  comprising 
R  and  the  set  of  pairs  comprising  Q.  Specifically,  f(x,y)  =  (fx(x),  fy(y)). 

One  may  speak  of  /  as  being  surjective  and/or  one-to-one,  meaning  as  a  set  map. 

•  The  functions  fx  ■  XR  — >  X®  and  fy  ■  YR  — >  Y®  are  set  maps.  One  may  speak  of  them 
as  being  surjective  and/or  one-to-one. 

•  One  may  also  ask  whether  the  induced  simplicial  maps  fx  ■  r  — ►  H/q  and  fy  :  Tr  — >  Tq 
are  surjective  and/or  injective  as  maps  between  simplicial  complexes  viewed  as  sets. 

Lemma  34  (Morphism  Properties).  Assume  the  notation  from  above  and  that  all  relevant 
relations  are  nonvoid.  Let  f  :  R  — >  Q  be  a  morphism  of  relations  (as  per  Definition  32).  Then: 

(i)  fx  and  fy  are  one-to-one  set  maps  =>  /  is  one-to-one  f  is  a  monomorphism. 

(ii)  f  surjective  =>  /  epimorphism  fx  and  fy  are  surjective  set  maps. 

(Additional  conditions  for  that  last  <= =>  :  The  =>  direction  assumes  that  Q  has  no  blank 
rows  or  columns,  while  the  •$=  direction  assumes  that  R  has  no  blank  rows  or  columns.) 

The  two  uni- directional  implications  =$■  above  are  strict. 


(Hi)  If  fx  ■  'ki?  — ►  4 'q  is  surjective  and  Q  has  no  blank  rows,  then  fx  ■  XR  — ►  X ®  is 
surjective. 

Similarly  for  fy,  now  assuming  that  Q  has  no  blank  columns. 

The  converses  need  not  hold.  Indeed,  f  itself  can  be  surjective  but  the  maps  of  simplicial 
complexes  need  not  be  (as  we  saw  with  the  maps  of  page  73). 

(iv)  If  fx  ■  XR  — >  X ®  is  one-to-one,  then  fx'-'&R—*  is  injective.  The  converse  holds  if 
R  has  no  blank  rows. 

Similarly  for  fy,  now  assuming  that  R  has  no  blank  columns  for  the  converse. 
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14.4  G-morphisms 

Since  a  relation  R  defines  a  poset  Pr,  rather  than  merely  create  morphisms  from  set  maps 
between  individuals  and  attributes  as  in  Definition  32,  we  may  broaden  the  definition  by 
considering  maps  between  posets: 


Definition  35  (G-Morphism) .  Let  R  and  Q  be  relations. 

A  G-morphism  /  :  R  — >  Q  is  any  poset  map  f  :  Pr  — >  Pq. 

Comments:  The  “G”  stands  for  “Galois”.  We  might  have  insisted  that  a  G-morphism 

R  — >  Q  be  a  lattice  morphism  Pf{  — >  Pq  rather  than  merely  a  poset  map  Pr  —>  Pq,  but 
that  might  be  too  restrictive.  Instead,  as  subsequent  lemmas  will  describe,  we  view  a  G- 
morphism  as  providing  homotopy  flexibility.  In  particular,  a  morphism  between  relations  as 
per  Definition  32  induces  two  homotopic  G-morphisms.  The  lattice  structure  of  the  image  is 
relevant  in  that  it  allows  one  to  fill  in  elements  not  directly  in  the  image  of  any  one  poset  map, 
as  will  become  apparent  in  Theorem  40. 


SW 


4>r 


R 


P>R 


♦  3PM 


*  mQ) 


Figure  49:  Diagram  showing  the  poset  maps  fx  and  fy  induced  by  a  morphism  f  :  R  —*  Q, 
along  with  the  homotopy  equivalences  between  each  relation’s  face  posets.  (The  diagram  need 
not  be  commutative,  but  is  almost  so;  see  Lemma  36.) 


Recall  that  a  morphism  f  :  R  —*  Q  as  per  Definition  32  is  built  from  two  set  maps  fx 
and  fy  and  that  these  set  maps  induce  simplicial  maps  between  the  Dowker  complexes,  as  per 
Lemma  33.  We  may  therefore  further  regard  fx  and  fy  as  poset  maps  between  the  face  posets 
of  the  Dowker  complexes:  fx  ■  and  fy  :  $($>r)  Consequently 

we  have  a  diagram  of  maps  as  in  Figure  49.  The  diagram  need  not  be  commutative,  but  the 
following  containments  hold: 

Lemma  36  (Containment).  Let  f  :  R  — >  Q  be  a  morphism  of  nonvoid  relations.  Then: 

(a)  (fyO(/)R)(a)  C  (</>qo/x)(ct),  for  every  cr  e  ^ r, 

(b)  (/x°YfR)( 7)  C  OAqo/yKt),  for  every  7  £  &R. 
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As  a  corollary,  we  see  that  the  diagram  of  Figure  49  describes  two  pairs  of  homotopic  maps: 

Corollary  37  (Homotopic  Face  Maps).  Let  f  :  R  — ►  Q  be  a  morphism  of  nonvoid  relations. 
Then: 

(a)  fx  and  i/>q  o  fy  o  cf} j  are  homotopic  poset  maps  — »  ^H/q), 

(b)  fy  and  4>q  o  fx  o  tpR  are  homotopic  poset  maps  #(<&#)  — > 

The  images  of  the  compositions  that  appear  in  Corollary  37  may  be  regarded  as  lying  in 
Pq.  We  may  further  restrict  the  domain  of  these  maps  to  be  Pr,  giving  us  the  following 
G-morphisms: 

Definition  38  (Induced  G-Morphism).  A  morphism  of  relations  f  :  R  —>  Q  induces  two 
G-morphisms  Pr  — ►  Pq,  defined  as  follows: 

fx  =  (V>Q  °  fy  °  (f>R)\pR  fy  =  (<t>Q°  fx°^R)\pR- 

(The  g  superscript  stands  for  “Galois”  while  the  vertical  bar  |  means  “restricted  to”.  See 
also  Appendix  1.2.) 

Corollary  39  (Homotopic  Poset  Maps).  Let  f  :  R  — >  Q  be  a  morphism  of  nonvoid  relations. 
The  induced  G-morphisms  fxify  ■  Pr  are  homotopic. 

Corollary  39  says  that  we  may  view  the  underlying  maps  fx  and  fy  of  a  morphism  /  as 
mapping  any  inference-closed  set  (viewed  either  as  a  set  of  individuals  or  as  a  set  of  attributes) 
from  the  domain  of  /  to  an  interval  of  inference-closed  sets  in  the  codomain  of  /. 


R 

a 

1 

• 

Q 

a  b 

i 

•  • 

2 

• 

(l,a) 


po 


<12,  a) 

I 


(h  ab) 


Figure  50:  Relation  R  is  a  subrelation  of  Q.  How  should  one  embed  Pr  into  Pq ?  There  are 
two  possible  embeddings,  related  by  a  homotopy. 


For  a  simple  example,  see  Figure  50.  One  may  regard  relation  R  as  a  subrelation  of  Q , 
then  define  /  :  R  — >  Q  to  be  inclusion.  For  instance,  maybe  R  and  Q  represent  individuals 
ffl  and  ff2  at  two  parties  a  and  b,  with  R  representing  known  parties  and  party-attendees  at 
some  time  and  Q  representing  an  update  of  that  information  at  a  later  time.  Observe  that: 


fxd1^))  =  Wg°/r°fe)({  1})  =  (V’Q°/y)({a})  =  ^q(W)  =  I1!2}  “=”  (12,a)i 
fy(( l>a))  =  {f’Q  °  fx  °  V;i?)({a})  =  {tpQ  °  /x)({l})  =  </><?({  1})  =  {a,b}  “=”  (l,ab). 
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The  last  equality  in  each  row  indicates  how  to  view  the  image  element  on  the  left  of  the  “=” 
as  an  element  of  the  poset  Pq. 

Both  / y  and  fy  tell  us  how  to  update  inference-closed  sets  from  Pr  into  inference-closed 
sets  within  Pq: 

•  The  map  fx  updates  association  inferences  while  holding  observed  attributes  fixed.  In 
this  example,  based  on  initial  information  (relation  R),  we  know  that  person  #1  attended 
party  a.  Once  we  update  that  information  (relation  Q )  we  can  conclude  that  person  #2 
also  attended  a  party  at  which  person  #1  was  present. 

•  Similarly,  the  map  fy  updates  attribute  inferences  while  holding  observed  individuals 
fixed.  In  this  example,  updated  information  allows  us  to  conclude  that  person  #1 
attended  not  only  party  a  but  also  party  b. 

In  general,  for  any  fixed  element  of  Pr,  the  two  maps  may  give  different  results,  but  those 
results  are  comparable  in  Pq.  Here  /  was  inclusion,  so  we  could  speak  of  holding  attributes  or 
individuals  “fixed”.  More  generally,  “fixed”  is  replaced  by  whatever  /  does. 

14.5  Surjectivity  Revisited 

A  paradox:  We  saw  on  page  73  a  surjective  morphism  /,  from  the  Mobius  strip  relation  of 
Figure  48  to  the  tetrahedral  relation  of  Figure  30,  whose  induced  simplicial  maps  fx  :  T m  — > 
Fy  and  fy  :  Fm  — >  Fy  were  not  surjective.  This  raises  some  questions: 

1.  Are  the  induced  poset  maps  fx ,  fy  :  Pm  —>  Pt  surjective? 

2.  If  not,  how  can  one  speak  of  a  surjective  morphism? 

(Note  that  PX1  is  isomorphic  to  Pq  as  shown  in  Figure  25  on  page  39.  A  rendering  would 
be  identical,  except  for  lowercase  letters  in  place  of  uppercase  ones.  The  lattice  Py  appears  in 
Figure  31  on  page  46.) 

The  answer  to  Question  1  is  that  the  two  poset  maps  are  not  surjective.  Observe  in  Table  9, 
for  instance,  that  the  image  of  fx  does  not  include  (4,  abd).  Similarly,  the  image  of  fy  does 
not  include  (134,  a). 

These  missing  elements  are  in  the  image  of  both  maps  together,  viewed  as  a  pair  of 
homotopic  maps,  as  per  Corollary  39.  Unfortunately,  that  explanation  is  not  a  full  answer 
to  Question  2.  For  instance,  neither  map’s  image  includes  the  element  (13,  ac)  of  Pt,  nor  does 
that  element  appear  in  any  interval  [fy{p),  f\{p)\  as  p  varies  throughout  Pm- 

To  answer  question  2,  the  lattice  structure  of  Py  is  useful.  In  the  example,  the  image  of  fx 
includes  all  elements  of  Py  that  correspond  to  maximal  simplices  of  ’F^.  Similarly,  the  image 
of  fy  includes  all  elements  of  Py  that  correspond  to  maximal  simplices  of  Fy.  Intuitively, 
we  therefore  expect  that  the  lattice  operations  (which  correspond  to  intersection  in  either  Fy 
or  will  generate  all  the  elements  of  Py.  In  that  sense,  the  surjectivity  of  /  appears  as 
surjectivity  of  each  of  fx  and  fy,  once  one  completes  their  images  under  lattice  operations. 
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V 

f9x 

(P) 

fyip) 

(12 

,  ab) 

(14  , 

ab) 

(14  , 

ab) 

(2 

,  abc) 

(1  , 

abc) 

(1  , 

abc) 

(123 

,  b) 

(124  , 

b) 

(124  , 

b) 

(23 

,  be) 

(12  , 

be) 

(12  , 

be) 

(3 

,  bed) 

(2  , 

bed) 

(2  , 

bed) 

(234 

>  c) 

(123  , 

c) 

(123  , 

c) 

(34 

,  cd) 

(23  , 

cd) 

(23  , 

cd) 

(4 

,  ede) 

(3  , 

acd) 

(3  , 

acd) 

(345 

>  d) 

(234  , 

d) 

(234  , 

d) 

(45 

,  de) 

(34  , 

ad) 

(34  , 

ad) 

(5 

,  ade) 

(34  , 

ad) 

(4  , 

abd) 

(145 

>  e) 

(134  , 

a) 

(34  , 

ad) 

(15 

,  ae) 

(134  , 

a) 

(4  , 

abd) 

(1 

,  abe) 

(14  , 

ab) 

(4  , 

abd) 

(125 

.  a) 

(134  , 

a) 

(14  , 

ab) 

Table  9:  Each  p  is  of  the  form  (<r, 7)  £  Pm-  The  elements  fx{p)  and  fy (p)  lie  in  Py.  See 
also  Figures  25  and  31,  on  pages  39  and  46,  respectively.  (As  in  those  figures,  the  table  elides 
commas  and  braces  from  set  notation.) 


The  following  theorem  summarizes  the  intuition  of  the  previous  pages: 

Theorem  40  (Lattice  Surjectivity).  Let  R  and  Q  be  nonvoid  relations  with  no  blank  rows  or 
columns.  Suppose  f  :  R  — >  Q  is  a  surjective  morphism  (in  the  sense  of  Definition  32).  For 
any  q  £  PQ: 

«  =  AV  qji,  with  each  qji  in  the  image  of  f9x  :  Pr  — ►  Pq, 

3  i 

«  =  VA  q'kf  ,  with  each  q'M  in  the  image  of  fy  :  Pr  — ►  Pq. 

k  l 
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A  Preliminaries 

Assumption:  All  simplicial  complexes,  relations,  posets,  and  lattices  in  this  document  are 

finite. 

A.l  Simplicial  Complexes 

We  largely  follow  the  notation  of  [14]  and  [1]. 

•  An  (abstract)  simplicial  complex  E  with  underlying  vertex  set  A  is  a  collection  of  finite 
subsets  of  X,  such  that  if  a  is  in  E  then  so  is  every  subset  of  a.  The  elements  of  E  are 
simplices.  We  allow  the  empty  set  0  to  be  a  simplex  in  E,  for  combinatorial  reasons.  We 
refer  both  to  the  elements  of  a  simplex  and  to  singleton  simplices  as  vertices.  Not  all 
elements  of  X  need  to  be  vertices  of  E. 

•  We  let  verts(E)  denote  the  set  of  vertices  that  actually  appear  in  E,  called  the  zero- 
skeleton  of  E.  [The  standard  notation  is  E®  but  that  conflicts  with  some  iterative 
notation  in  the  proof  of  Theorem  25.] 

•  The  dimension  of  a  simplex  o  is  one  less  than  its  cardinality.  The  empty  simplex  0  has 
dimension  —  1.  If  a  simplex  has  dimension  k  we  sometimes  call  it  a  k-simplex. 

•  The  void  complex  0  has  no  simplices  in  it.  The  void  complex  is  degenerate.  The  empty 
complex  {0}  consists  solely  of  the  empty  simplex.  The  empty  complex  represents  the 
empty  topological  space.  It  is  also  the  sphere  of  dimension  —1,  written  §_1.  (There 
could  be  be  different  instances  of  the  void  or  empty  complex,  depending  on  the  underlying 
vertex  set  X,  though  frequently  one  takes  that  to  be  empty.) 

•  A  simplex  a  of  a  simplicial  complex  E  is  a  free  face  of  E  if  it  is  a  proper  subset  of  exactly 
one  maximal  simplex  r  of  E.  (The  empty  simplex  0  can  sometimes  be  a  free  face.) 

•  With  E  a  simplicial  complex,  Cfc(E;Z)  is  the  group  of  simplicial  /c-chains  over  E  with 
integer  coefficients.  A  £:-chain  c  6  Ck( E;Z)  assigns  to  each  oriented  fc-simplex  t  an 
integer,  such  that  c(— r)  =  —  c(r).  (Caution:  We  will  later  use  the  word  “chain”  in  the 
poset  sense;  there  should  be  no  ambiguity  given  context.) 

•  Suppose  E  is  a  simplicial  complex  and  c  £  Ck( E;Z).  Assume  all  simplices  have  been 
assigned  an  orientation  in  E.  One  can  write  c  =  JT  njTj  uniquely,  for  some  collection 
of  (oriented)  fc-dimensional  simplices  {rj}  in  E  such  that  rij  7^  0  for  each  i.  This  means 
c(Tj)  =  ni  for  each  r*  that  appears  in  the  sum  and  c(r)  =  0  for  all  other  /c-simplices  r. 

We  define  the  support  of  c  as  ||c||  =  UjTj.  The  support  is  the  set  of  all  vertices  that  appear 
in  any  of  the  simplices  r  for  which  c(r)  is  nonzero. 

•  We  let  d  and  d  stand  for  “boundary” .  There  are  two  contexts: 

1.  When  V  is  a  nonempty  finite  set  of  points,  then  d(V)  means  the  simplicial  complex 
whose  underlying  vertex  set  is  V  and  whose  simplices  consist  of  all  proper  subsets 
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of  V.  We  refer  to  this  complex  as  the  boundary  complex  of  the  full  simplex  on  vertex 
set  V.  It  has  the  homotopy  type  of  a  sphere,  specifically  §n_2,  with  n  =  |  V),  for  all 
n  >  1. 

2.  We  also  designate  the  simplicial  boundary  operator  by  d  and  the  reduced  boundary 
operator  by  d.  These  operators  are  families  of  maps,  describing  for  each  dimension 
k  a  group  homomorphism  (7fc(E;Z)  — >  Ck- i(E;Z). 

(See  below  for  the  special  case  k  =  0.) 

Given  an  oriented  fc-simplex  a  =  {.To, . . . ,  x*,},  with  k  >  1,  difa)  =  d^(a)  = 
Ei=o(-l)Vi,  where  r,;  is  the  oriented  (k  —  l)-simplex  formed  from  a  by  removing 
vertex  x and  using  the  induced  orientation  of  a  on  n. 

For  k  =  0,  do  :  Co(E;Z)  — >  0,  while  do  :  Co(E;Z)  — ►  Z,  with  5o({u})  =  1,  for  each 
vertex  {u}  G  E.  There  is  also  a  map  <9_i  :  Z  — >  0.  See  [14,  12]  for  further  details. 

We  are  mainly  interested  in  the  reduced  boundary  operator  d. 

We  often  write  d  in  place  of  d %  when  the  context  k  is  clear. 

Elements  of  the  subgroup  ker(<9fc)  are  called  reduced  k-cycles. 

Elements  of  the  subgroup  img(<9/c+i)  are  called  reduced  k-boundaries. 

•  Given  a  simplicial  complex  E,  Hk( E;  Z)  is  the  reduced  homology  group  in  dimension 
k  based  on  simplicial  chains  over  E  with  integer  coefficients.  It  is  a  quotient  group, 
measuring  the  number  of  reduced  fc-cycles  that  are  not  reduced  ^-boundaries. 

Formally,  _£/fc(E;Z)  =  ker(c^)/img(9fc_|_i).  (That  makes  sense  since  d %  o  d^+i  =  0.) 

•  Given  a  simplicial  complex  E  and  a  set  a,  we  define  the  following  three  simplicial 
subcomplexes  of  E  in  the  standard  way: 

—  The  link  of  cr  in  E:  Lk(E,  a)  =  {r  6  E  |  r  O  <7  =  0  and  r  U  a  G  E  }. 

—  The  deletion  of  a  in  E:  dl(E,  a)  =  {t  G  E  |  r  n  a  =  0}. 

—  The  closed  star  of  a  in  E:  St(E,  a)  =  {r  G  E  |  r  U  a  G  E}. 

The  definitions  make  sense  even  when  a  is  not  itself  a  simplex  in  E,  though  in  that  case 
both  Lk(E,  a)  and  St(E,  a)  are  the  void  complex  0. 

Observe  that  dl(E,  a )  0  St(E,<r)  =  Lk(E,  a)  and  St(E,<r)  =  Lk(E,  a)  *  <a>. 

Here  *  means  simplicial  join  (described  below)  and  <a>  is  the  simplicial  complex 
generated  by  a,  consisting  of  all  subsets  of  a. 

When  a  consists  of  a  single  element  v,  i.e.,  a  =  {u},  we  tend  simply  to  write  Lk(E,u), 
dl(E,  v),  St(E,  v).  Aside:  For  a  singleton  v,  it  is  further  true  that  dl(E,  v)  USt(E,  v)  =  E. 

•  One  may  associate  a  geometric  realization  to  a  finite  abstract  simplicial  complex  E  by 
embedding  E  into  a  finite-dimensional  Euclidean  space.  One  may  therefore  think  of  E 
as  a  topological  space  [14,  1], 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


Final  Report,  AFOSR  Award  FA9550-14-1-0012 


83 


•  Suppose  E  and  T  are  two  simplicial  complexes  with  underlying  vertex  sets  X  and  Y, 
respectively.  A  set  function  f  :  X  —+  Y  is  said  to  be  a  simplicial  map  if  it  satisfies  the 
following  condition:  If  cr  £  E,  then  /(<r)  £  F. 

In  that  case,  one  may  view  /  as  a  map  of  simplicial  complexes,  /  :  E  — +  T. 

A  simplicial  map  may  further  be  viewed  as  a  continuous  function  between  the  geometric 
realizations  of  E  and  F. 

•  When  X\  and  X2  are  topological  spaces,  the  notation  X\  ~  X2  means  that  X\  and  X2 
have  the  same  homotopy  type  [1,  12], 

•  When  X\  and  X2  are  topological  spaces,  X\  V  X2  means  a  wedge  sum  of  X\  and  X2  [12], 

•  Suppose  U  is  a  finite  nonvoid  collection  of  (not  necessarily  distinct)  topological  subspaces 

of  some  ambient  space.  One  may  define  a  simplicial  complex  called  the  nerve 

of  U:  The  simplices  of  N{U)  are  given  by  the  empty  simplex  and  all  nonvoid  finite 
subcollections  {U\, . . . ,  14}  of  U  such  that  U\  fl  •  •  •  O  7^  0.  Under  certain  conditions, 
if  these  nonempty  intersections  are  contractible,  then  the  nerve  has  the  same  homotopy 
type  as  the  union  of  all  the  elements  in  U:  M(U)  —  U[/eW  U.  See  [1,  12]  for  conditions. 

•  Suppose  E  and  T  are  simplicial  complexes  with  disjoint  underlying  vertex  sets.  The 
simplicial  join  [18]  of  E  and  T  is  the  simplicial  complex 

E*T  =  {o  U  7  |  cr  £  E  and  7  £  T}. 

The  underlying  vertex  set  of  E  *  T  is  the  union  of  the  underlying  vertex  sets  of  E  and  T. 

A. 2  Partially  Ordered  Sets  (Posets) 

We  largely  follow  the  notation  of  [18]. 

•  A  poset  P  is  a  set  of  elements  with  a  partial  order,  sometimes  written  simply  as  “<” 
other  times  as  “<p” .  The  symbols  “<”,  “>”  and  “=”  are  defined  accordingly. 

•  A  chain  c  in  a  poset  P  is  a  totally  ordered  subset  of  P,  which  we  often  write  as 

c  =  {Po  <  Pi  <  •••  <  Pe}-  The  length  t(c )  =  t  of  a  chain  c  is  one  less  than  the 

number  of  elements  in  the  chain  (much  like  simplex  dimension) .  The  length  of  the  empty 
chain  is  —1.  The  length  t[P)  of  a  poset  P  is  the  maximum  length  of  any  chain  in  P. 

•  The  face  poset  5(E)  of  a  nonvoid  simplicial  complex  E  consists  of  all  nonempty  simplices 
of  E,  partially  ordered  by  set  inclusion. 

•  The  order  complex  A (P)  of  a  poset  P  is  the  simplicial  complex  whose  simplices  are  given 

by  all  finite  chains  {po  <  p\  <  ■  ■  ■  <  pc}  of  P.  (If  P  =  0,  then  A (P)  =  {0}.) 

•  One  may  speak  of  the  topology  of  a  poset:  One  says  that  a  poset  P  has  a  topological 
property  when  its  order  complex  A (P)  has  that  property.  For  instance,  to  say  that  a 
poset  is  contractible  means  that  its  order  complex  is  contractible.  To  say  that  two  posets 
P  and  Q  are  homotopic  means  that  A (P)  and  A(Q)  have  the  same  homotopy  type.  Etc. 
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•  It  is  a  fact  that  A(3r(E))  is  homeomorphic  to  E.  Indeed,  A(5r(E))  may  be  viewed  as  the 
first  barycentric  subdivision  of  E,  which  we  write  as  sd(E).  See  [18,  16]. 

•  A  set  function  6  :  P  — ►  Q  between  two  posets  P  and  Q  is  said  to  be  a  poset  map  if  it  is 
either  order-preserving  or  order-reversing.  That  means,  for  all  x,y  £  P: 

order-preserving:  If  x  <p  y,  then  9(x)  <q  9(y). 

order-reversing:  If  x  <p  y,  then  9(x)  >q  9(y). 

•  A  poset  map  9  :  P  — >  Q  between  two  posets  P  and  Q  induces  a  simplicial  map  between 
the  associated  order  complexes  9  :  A(P)  — >  A(Q). 

•  An  order-preserving  poset  self-map  9  :  P  — ►  P  is  said  to  be  a  closure  operator  when 
x  <  9(x),  for  all  x  £  P,  and  9  o  9  =  9.  A  closure  operator  9  defines  a  homotopy 
equivalence  between  P  and  the  image  6(P).  See  [1,  18]. 

A. 3  Semi-Lattices  and  Lattices 

Let  L  be  a  partially  ordered  set  and  suppose  p,q  £  L: 

•  li  p  and  q  have  a  least  upper  bound,  then  one  writes  p  V  q  to  mean  that  least  upper  bound. 
If  every  pair  of  elements  has  a  least  upper  bound,  one  says  that  L  is  a  join  semi-lattice. 

•  If  p  and  q  have  a  greatest  lower  bound,  then  one  writes  p  A  q  to  mean  that  greatest  lower 
bound.  If  every  pair  of  elements  has  a  greatest  lower  bound,  one  says  that  L  is  a  meet 
semi-lattice. 

•  A  poset  that  is  both  a  join  semi-lattice  and  a  meet  semi-lattice  is  said  to  be  a  lattice. 

•  If  L  has  a  unique  top  element,  we  may  designate  that  element  by  1  or  lp. 

•  If  L  has  a  unique  bottom  element,  we  may  designate  that  element  by  0  or  Ox,. 

•  If  L  is  a  finite  join  semi-lattice  with  a  unique  bottom  element,  then  L  is  a  lattice. 
Similarly,  if  L  is  a  finite  meet  semi-lattice  with  a  unique  top  element,  then  L  is  a  lattice. 

•  A  lattice  L  is  said  to  be  bounded  if  it  has  a  unique  top  element  1  and  a  unique  bottom 
element  0.  (These  are  same  element  if  L  is  a  singleton.) 

•  When  L  is  a  bounded  lattice,  the  proper  part  of  L  is  the  poset  L  =  L\{  0, 1}. 

•  Suppose  L  is  a  bounded  lattice  and  p  £  L.  Then  the  complements  of  p  are  given  by  the 
set  £(p)  =  {q  £  L  |  q  V  p  =  1  and  q  A  p  =  0  } . 

•  A  bounded  lattice  L  is  said  to  be  noncomplemented  if  £(p)  =  0  for  at  least  one  p  £  L. 
The  proper  part  L  of  a  noncomplemented  lattice  L  is  contractible  ([1],  Theorem  10.15). 

•  Suppose  L  is  a  bounded  lattice.  The  elements  of  L  immediately  below  1  are  called  co¬ 
atoms.  These  are  the  maximal  elements  of  L.  The  elements  immediately  above  0  are 
called  atoms.  These  are  the  minimal  elements  of  L. 
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A. 4  Relations 

Let  R  be  a  relation  on  X  x  Y.  We  use  the  following  notation  and  conventions: 

•  R  is  a  set  of  pairs,  namely  a  subset  of  the  cross  product  XxY .  It  is  convenient  sometimes 
to  view  R  as  a  matrix  of  Os  and  Is,  perhaps  drawn  as  a  matrix  of  blank  and  nonblank 
entries,  representing  the  characteristic  function  of  this  set  of  pairs. 

•  Even  if  X  ^  0  and  Y  ^  0,  it  is  possible  that  R  =  0,  in  which  case  we  say  that  R  is  an 
empty  relation. 

•  If  either  X  =  0  or  Y=  0,  then  we  say  that  R  is  a  void  relation. 

On  some  occasions,  we  may  treat  a  void  relation  R  much  like  an  empty  relation,  in  the 
sense  that  we  will  let  the  Dowker  complexes  defined  below  be  empty  rather  than  void. 
That  view  will  sometimes  be  convenient  when  R  is  derived  from  some  encompassing 
relation  as  a  link  or  deletion  in  a  simplicial  complex. 

•  We  often  refer  to  elements  of  X  as  individuals  and  elements  of  Y  as  attributes. 

•  Yx  is  the  set  of  attributes  that  individual  x  has  (in  relation  R).  Viewing  R  as  a  matrix, 
one  may  think  of  Yx  as  the  row  of  R  indexed  by  x.  We  say  that  the  row  is  blank  when 

Yx  =  0. 

•  Xy  is  the  set  of  individuals  who  have  attribute  y  (in  relation  R).  Viewing  R  as  a  matrix, 
one  may  think  of  Xy  as  the  column  of  R  indexed  by  y.  The  column  is  blank  when  Xy  =  0. 

•  is  the  Dowker  simplicial  complex  associated  with  R  whose  underlying  vertex  set  is 

Y.  A  nonempty  subset  7  of  Y  is  a  simplex  in  &r  precisely  when  there  exists  such 

that  (x,  y)  £  R  for  all  y  £  7.  We  refer  to  x  as  a  witness  for  7. 

When  R  is  void,  is  void  as  well,  except  as  otherwise  indicated  in  the  text. 

When  R  is  nonvoid,  $>r  contains  the  empty  simplex.  Moreover,  we  may  view  as 
generated  by  all  the  rows  of  R.  In  particular,  Yx  £  $>r  for  each  x  E  X. 

•  is  the  Dowker  simplicial  complex  associated  with  R  whose  underlying  vertex  set  is 
X.  A  nonempty  subset  a  of  X  is  a  simplex  in  'Lr  precisely  when  there  exists  y  £  V  such 
that  (x,  y)  £  R  for  all  x  £  a.  We  refer  to  y  as  a  witness  for  a. 

When  R  is  void,  'Lr  is  void  as  well,  except  as  otherwise  indicated  in  the  text. 

When  R  is  nonvoid,  r  contains  the  empty  simplex.  Moreover,  we  may  view  as 
generated  by  all  the  columns  of  R.  In  particular,  Xy  £  H/r  for  each  y  £  Y. 

•  There  exist  homotopy  equivalences  4>r  :  'L/j  — ►  and  ipR  '■  ‘hfi  — 1 ►  ^r. 

Viewed  as  poset  maps  4>r  :  ^(^r)  — >  and  ipR  :  5(<l>i?)  — ►  one  obtains 

explicit  formulas,  sending  nonempty  simplices  to  nonempty  simplices: 

4>r(v)  =  f]Yx  and  ipRin)  =  P|  Xy. 
x€cr  y£  7 
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Suppose  1/0  and  7/0.  Then  the  intersections  appearing  in  the  previous  formulas 
comprise  the  witnesses  for  the  respective  simplex  arguments.  Consequently,  one  may  use 
the  formulas  more  generally  as  tests  for  membership  in  the  Dowker  complexes: 

—  For  any  a  C  X,  a  G  'Fr  if  and  only  if  4>r(ct)  /  0. 

—  For  any  7  C  Y,  7  £  4>r  if  and  only  if  ^(7)  /  0. 

These  tests  also  make  sense  for  the  empty  set,  that  is,  when  a  =  0  or  7  =  0.  In  particular, 
<Ar(0)  =  ^  and  /r(0)  =  -X\ 

Composing  <j)R  and  f)R  as  iJjro  0R  :  S'('Fr)  —>  S'('Fr)  and  <f>R  o  /r  :  £($r)  ->•  #($r) 
produces  closure  operators.  See  Appendix  B  for  further  details. 

PR  is  the  doubly-labeled,  poset  associated  with  R  as  per  Definition  3  on  page  20.  Each 
element  in  Pr  is  of  the  form  (a,  7),  with  a  /  0  and  7/0,  such  that  a  =  ipR( 7)  and 
7  =  <Mt)- 

One  may  view  PR  either  as  the  image  (/i?o/K)(gr(^'R))  or  as  the  image 

PR  is  the  Galois  lattice  formed  from  Pr  as  per  Definition  12  on  page  38. 

We  sometimes  view  Pr  as  “almost  a  join-based  lattice” ,  as  per  Definition  24  on  page  47. 
That  amounts  to  adjoining  a  single  new  element  1  above  Pr,  then  inducing  a  join 
operation  on  PrU{1}  from  the  join  operation  on  Pr-  Thus  PrU{1}  is  a  join  semi-lattice. 
If  we  further  adjoin  a  new  bottom  element  0,  then  Pr  U  {0, 1}  is  a  lattice. 

One  may  speak  of  the  topology  of  a  relation :  One  says  that  a  relation  R  has  a  topological 
property  when  any  and  all  of  4>r,  SBr,  and  A  (Pr)  have  that  property.  (This  convention 
makes  sense  by  Dowker’s  Theorem  on  page  17  and  the  nature  of  Pr.) 
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B  Basic  Tools 


This  appendix  reviews  some  basic  facts  about  relations,  their  Dowker  complexes,  and  the 
Galois  connection.  Recall  the  formulas  from  page  85. 

Although  we  do  not  always  say  so  explicitly,  there  are  dual  statements  for  the  lemmas  and 
corollaries  in  this  appendix,  for  each  of  the  two  perspectives  offered  by  Dowker’s  Theorem,  by 
inverting  the  roles  of  individuals  and  attributes. 

Lemma  41.  Let  R  be  a  relation  on  X  x  Y .  Then  (j)R  is  inclusion-reversing. 

Proof.  Let  o'  C  o  C  X.  Then:  <^(cx')  =  p|  Yx  D  p|  Yx  =  (j>R{a). 

X  Scr'  x£cr 

Just  to  be  careful:  if  o'  =  0,  then  4>r(o')  =  Y,  which  does  indeed  contain  <fR(o).  □ 


Each  of  fR  and  ipR  is  inclusion-reversing,  so  4>RoipR  is  inclusion-preserving.  Lemmas  42  and 
44  establish  that  fR  o  f>R  is  a  closure  operator  when  viewed  as  a  poset  map  5(<h^)  — >  ^(d^): 

Lemma  42.  Let  R  be  a  relation  on  X  xY.  For  all  7  C  Y ,  7  C  [fR  o  ,fR){'y). 

Proof. 

(fe°W(  7)  =  n  Yx ,  with  <7  =  flXr 

x£cr  y£  7 

The  assertion  is  clear  if  7  =  0  or  o  =  0.  Otherwise,  let  y  £  7  and  x  £  a.  Then  x  £  Xy ,  so 
y  €YX.  Since  x  is  arbitrary  in  0,  we  see  that  y  £  (<fR  o  ifR)(Y  and  thus  7  C  ( <f>R  o  '0/?)  (7) -  □ 

Corollary  43.  Let  R  be  a  relation  on  X  x  Y . 

If  7  is  a  maximal  simplex  of  then  {(j>R  o  ifR)( 7)  =  7. 

Proof.  When  7  7^  0,  this  assertion  follows  from  Lemma  42  and  maximality  of  7.  Otherwise, 
apparently  &R  =  {0}  and  so  (<f>R  o  V’i?)(0)  =  4>r{X)  =  0  (since  <fR  must  map  X  into  <&/?).  □ 


Lemma  44.  Let  R  be  a  relation  on  X  x  Y . 

For  all  7  C  Y,  ((cj)R  o  ipR)  o  ( cj>R  o  ipR))(  7)  =  (cj>R  o  ifnf'y). 

<t>R 


Proof.  Consider:  7  >■ 


i’R  <t>R 

- >  (7  I - 


7 


i’R  / 

- >  a 


7 


We  need  to  show  that  7'  =  . 

By  Lemma  42  and  its  dualization,  7  C  7'  C  7"  and  o  C  o'. 

By  Lemma  41,  <fiR  is  inclusion-reversing,  so  o  C  o'  implies  y  D  7",  and  thus  Y  =  Y' . 
Comment:  By  the  dual  of  Lemma  41,  is  inclusion-reversing,  so  in  fact  also  0  =  0'.  □ 


Corollary  45.  Let  R  be  a  relation  on  X  x  Y .  For  all  o  C  X,  (f>R  o  ifR)((j)R(o))  =  <fR(o). 
Proof.  This  follows  from  a  dual  version  of  the  comment  at  the  end  of  the  proof  of  Lemma  44.  □ 


Corollary  46.  Let  R  be  a  relation  on  X  xY .  For  all  x  £  X,  (d>R  o  ipR)(Yx)  =  Yx. 

Proof.  The  assertion  follows  from  Corollary  45,  with  o  =  {x}. 

(This  includes  the  case  Yx  =  0.)  □ 
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Lemma  47.  Let  R  be  a  relation  on  X  x  Y  and  suppose  g  C  Y. 

The  following  two  conditions  are  equivalent: 

(a)  (cpn  o  iPr)(x)  =  X>  for  every  proper  subset  x  of  rj, 

(b)  (<Pr  o  tpR)( 7)  =  7,  for  all  7  of  the  form  7  =  r?  \  {y}  with  y  G  rj. 

Proof.  Certainly  (a)  implies  (b).  Suppose  (b)  holds,  but  there  is  some  x  C  7  such  that 
V’iO(x)-  Let  V  £  (4>r°  Vt)(x)  \  X  and  consider  7  =  rj  \  {?/}. 

Observe  that  x  C  7,  so  ye  (fa  o  i/jr)(x)  C  o  ifR){x).  Consequently, 

77  =  7U  {i/}  C  (4>r  o  i/jr)^)  =  7  C  77,  which  is  a  contradiction.  D 


Definition  48  (Connected).  4  relation  R  on  X  x  Y  is  connected  if  R  is  connected  when 
viewed  as  an  undirected  bipartite  graph  on  the  vertex  sets  X  and  Y . 

Definition  49  (Tight).  A  relation  R  on  X  x  Y  is  tight  if  it  has  no  blank  rows  or  columns. 


Lemma  50  (Connectedness).  Let  R  be  a  tight  relation  on  X  xY,  with  both  X  and  Y  nonempty. 
Then  the  following  three  conditions  are  equivalent: 

(a)  R  is  connected. 

(b)  'S'r  is  path- connected. 

(c)  is  path- connected. 

Proof.  We  will  show  that  (a)  and  (b)  are  equivalent.  The  proof  for  (a)  and  (c)  is  similar,  or 
one  can  simply  invoke  Dowker  duality. 

I.  Suppose  R  is  connected.  Consider  two  vertices  xq  and  Xf  of  4'r.  Since  R  is  connected 
as  a  bipartite  graph,  there  exists  a  path  xo,  2/1 ,  xi,  7/2,  •  •  • ,  yn ,  xn  =  Xf.  Observe  that  each  yi  is 
a  witness  for  the  simplex  {xj_i,  07}  G  'k/?,  so  in  'I' r  there  exist  edges  {xo,  xi}, . . . ,  {xn_i,  xn}. 
Since  ^ r  is  a  simplicial  complex,  we  see  that  it  is  path-connected. 

II.  Suppose  'k/j  is  path-connected.  Since  R  is  tight,  each  y  G  Y  appears  as  the  vertex  of 

an  edge  (x,  y)  in  the  bipartite  graph  R.  To  show  that  R  is  connected,  it  therefore  is  enough  to 
show  that  any  two  elements  xo  and  Xf  of  X  may  be  connected  by  a  path  in  the  bipartite  graph. 
Since  R  is  tight,  xo  and  Xf  are  each  vertices  of  'k r.  Since  \k r  is  path-connected,  there  exists 
a  path  between  xo  and  xp  in  *k r.  Since  \k/{  is  a  finite  simplicial  complex,  we  can  deform  that 
path  so  that  it  consists  of  finitely  many  edges  {xo,  xi}, . . . ,  {xn_i,  xn},  with  each  x*  a  vertex 
of  'ki?  and  xn  =  x/.  Each  edge  {xj_i,  Xj}  has  some  witness  yi  G  Y.  So  xo,  yi,  x\,  7/2, . . . ,  yn,  x/ 
is  a  path  connecting  xq  and  x/  in  the  bipartite  graph  R.  □ 
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Lemma  51  (Components).  Let  R  be  a  tight  relation  on  X  x  Y ,  with  both  X  and  Y  nonempty. 
Suppose  R  =  R\  U  ■  ■  ■  U  Re,  with  the  {-Rj}  pairwise  disjoint  and  each  Ri  a  connected  component 
of  R  viewed  as  a  bipartite  graph  on  X  and  Y .  Then  X,  Y ,  'I' r,  and  <Pr  decompose  as  follows: 

(a)  X  =  X\  U  ■  ■  ■  U  Xe,  with  the  {Xi}  pairwise  disjoint  and  each  Xj  not  empty. 

(b)  Y  =  Y\  U  •  •  •  U  Ye,  with  the  { Yi }  pairwise  disjoint  and  each  Yi  not  empty. 

(c)  Ri  is  the  restriction  of  R  to  X*  x  Yt,  and  is  tight,  for  i  =  1, 

(d)  'P/j  =  'Pi?1U-  •  -U  'P R. ,  with  the  {iP^}  pairwise  disjoint  and  each  'P r.  path- connected. 

(e)  <Pr  =  •  -U  <P n(: ,  with  the  {‘pRj}  pairwise  disjoint  and  each  $r.  path- connected. 


Proof.  Let  Xi  =  {x  |  ( x ,  y )  £  Ri  for  some  y  £  Y }  and  Yj  =  {y  |  ( x ,  y)  £  Ri  for  some  x  £  X  }, 
for  i  =  1, ...  ,£.  These  sets  are  nonempty  since  the  components  of  R  are  necessarily  nonempty. 

To  see  that  Xi  n  Xj  =  0  unless  i  =  j,  suppose  x  £  X%  fi  Xj.  Then  (x,y)  £  Ri  for  some 
y  £  Y  and  (x,  y ')  £  Rj  for  some  y’  £  Y .  Since  Ri  and  Rj  are  connected  components  of  R, 
i  =  j.  Next  observe  that  each  x  of  X  must  appear  in  some  Xi  since  R  has  no  blank  rows. 
Point  (a)  follows.  Point  (b)  is  similar. 

For  (c),  observe  that  if  (x,  y )  £  Ri  C  R  then  x  £  Xi  and  y  £  Yt,  so  (x,  y)  is  in  the  restriction 
of  R  to  Xi  x  Yi.  Conversely,  if  (x,  y)  £  R  with  x  £  Xi  and  y  £  Yi,  then  (x,  y)  £  Rj  for  some  j. 
By  the  previous  reasoning,  i  =  j.  Tightness  follows  by  definition  of  Xi  and  Y). 

For  (d),  \Pr.  C  'P/j  since  Ri  C  R ,  for  each  i  =  1, . . .  Now  suppose  0  /  a  £  'P/j.  Then 
there  exists  y  £  Y  such  that  ( x,y )  £  R  for  every  x  £  a.  For  some  i,  y  £  Y%.  Since  Ri  is  a 
connected  component  of  R,  ( x,y )  £  Ri  for  every  x  £  a,  so  cr  £  *P r..  The  {'pRi}  are  pairwise 
disjoint  since  the  underlying  vertex  sets  {Xj}  are  pairwise  disjoint.  Path-connectedness  follows 
from  Lemma  50,  since  each  Rj  is  tight  and  connected.  Point  (e)  is  similar.  □ 


Corollary  52  (Component  Maps).  Assume  the  hypotheses  and  constructions  as  in  Lemma  51 
and  its  proof.  Then: 


^Riil)  =  ^(7).  for  each  0  /  7  £  , 

</>Rj(<7)  =  < Pr(ct ),  for  each  0  /  a  £  <P R.,  i  =  !,...,£. 


Proof.  By  direct  computation:  ^  =  p|(x„n.Y.)  =  f|  A,  =  fe(7). 

ye 7  yS7 

The  second  equality  comes  from  the  fact  that  each  Xy  can  touch  only  Xj,  since  Ri 
connected  component  of  R.  The  argument  for  the  maps  is  similar. 


is  a 
□ 


Corollary  53  (Component  Privacy).  Assume  the  hypotheses  and  constructions  as  in 
Lemma  51  and  its  proof.  Let  i  £  { 1  ,...,£} . 

If  i[jr  o  (j)R  is  the  identity  on  'P r  and  Yi  0  <P#.,  then  ipR.  o  (j)R.  is  the  identity  on  'P r.. 

If  (j)R  o  ipR  is  the  identity  on  <P r  and  Xi  (f  'P r.,  then  f>Ri  o  i/jr.  is  the  identity  on  <P r.  . 

Proof.  Suppose  0  /  cr  £  Vr.,  then  0  /  0R.(cr)  £  $r.,  so  by  Corollary  52,  (Y>r.  o  ^>r.)(<t)  = 
ilf’R  0  4>R){cr)  =  cr.  And  (^r.  o  </>r.)(0)  =  V’Rj(^)  =  0,  since  1}  0  <Pr.. 

The  argument  for  <pR.  o  ipR.  is  similar.  □ 
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C  Links  and  Inference 

This  appendix  provides  some  technical  tools  for  modeling  inference,  particularly  in  links,  ending 
with  some  instances  in  which  inference  is  unavoidable. 

Intuition:  The  link  Lk(<h#,7)  of  a  set  of  attributes  7  in  the  Dowker  complex  can  be 
understood  as  a  description  of  what  may  yet  be  observed  or  inferred,  conditional  on  having 
already  observed  7. 

Lemma  54.  Let  R  be  a  relation  on  X  x  Y ,  with  both  X  and  Y  nonempty.  Suppose  7  E 
Define  relation  Q  as  a  restriction  of  R  by 

Q  =  R\a xy,  with  cr  =  fiR('y)  and  Y=  |J  Yx  \  7. 

irEcr 

Then  Lk(<h/j,7)  =  Tq,  as  collections  of  simplices  (i.e.,  ignoring  underlying  vertex  sets). 

( Observe  that  a  0.  If  Y  =  0,  then  technically  Q  is  void,  but  it  is  convenient  to  let  both 
Tq  and  Tq  be  instances  of  the  empty  complex  {0}.  —  In  a  standard  link,  one  might  define 
Y  =  Y\  7.  With  Y  as  above,  Q  always  discards  blank  columns  of  R,  even  when  7  =  0.j 

Proof.  Observe  that  7  C  Yx  if  and  only  if  x  6  a. 

We  discuss  the  case  Y  =  0  separately,  for  clarity.  We  need  to  show  that  Lk(<h/j,7)  =  {0}. 
If  Lk(<I>i2,7)  /  {0})  then  there  exists  some  y  E  verts(Lk(<h/j,  7)).  By  definition  of  link,  y  0  7 
and  there  exists  i£l  such  that  (x,  y)  £  R  for  all  y  £  7  U  {y}.  That  means  x  £  a,  so  y  £  Y, 
a  contradiction. 

The  converse  is  true  as  well:  If  Lk(d>K,7)  =  {0},  then  Y  =  0.  For  if  some  x  £  0  has  an 
attribute  y  in  addition  to  all  those  in  7,  then  y  would  be  a  vertex  in  the  link. 

Now  suppose  7^0: 

I.  If  £  £  Lk($/j,7),  then  £  n  7  =  0  and  there  exists  x  £  X  such  that  (x,y)  £  R  for  every 
y  £  £  U  7.  So  £  C  Yx  \  7  and  x  £  ^(7)  =  a.  Thus  (x,  y)  £  Q  for  every  y  £  £,  meaning  £  £  <I>q. 

II.  Conversely,  if  £  £  ‘hQ,  then  there  exists  x  £  a  such  that  (x,y)  £  Q  C  I?  for  every  y  £  £. 

By  definition  of  cr,  (x,  y)  £  I?  for  every  y  £  7.  Combining  these  two  assertions,  we  see  that 
(x,  y)  &  R  for  every  y  £  £  U  7.  And  £  n  7  =  0  since  £C7.  So  £  £  Lk(<h#,  7).  □ 

Comment:  There  is  a  dual  version  of  this  lemma  for  links  of  individuals  a.  modeling 
Lk(T^j,  a)  as  Tq  for  an  appropriate  relation  Q.  We  see  an  instance  of  that  in  Theorem  9  on 
page  100,  with  a  consisting  of  a  single  individual  x. 

With  notation  and  construction  as  in  Lemma  54,  the  following  formulas  hold,  assuming 

F/0: 

•  Suppose  £  C  Y  and  define  r  =  £  U  7.  Then 

=  D(xyna)  =  (n*o  n  (n*0  =  n  ^ 

vet  ye<£  y&i  ye(?u7) 

Notes:  We  allow  £  =  0,  since  ^q(0)  =  a  =  ^(7).  We  do  not  require  £  £  <hg.  The 
equalities  hold  regardless.  Of  course,  £  £  Tq  if  and  only  if  V’q(£)  /  0- 
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•  Suppose  0  /  k  C  o.  Then 

=  fl(^)  =  = 

xEk  xEk 

And  thus  also  4>r(k)  =  4>q{k)  U  7,  since  7  C  Yx  for  all  x  G  ex. 

Notes:  Here  we  do  not  allow  k  =  0,  since  ^>g(0)  =  Y  whereas  (/>r($)  =  Y.  It  need  not  be 
true  that  Y  =  Y  U  7.  Again,  n  G  'kg  if  and  only  if  4>q(r)  /  0,  this  valid  also  for  k  =  0. 

Comment:  If  Y  =  0,  the  previous  formulas  still  hold,  albeit  trivially.  However,  testing  for 
membership  in  \kg  via  the  question  “Is  <j>Q(n)  nonempty?”  no  longer  makes  sense. 


Lemma  55.  Let  R  be  a  relation  on  X  x  Y ,  with  both  X  and  Y  nonempty.  Suppose  7  C  Y. 
Then  dl(3>R,7)  =  <1 >g/,  with  Q'  formed  from  R  by  removing  the  columns  corresponding  to  7, 
that  is,  Qf  =  ??|xx(i'\7)-  (Here  we  let  'I ’qi  and  <hg/  each  be  the  empty  complex  if  7  =  Y.) 

Proof.  An  individual  x  G  X  is  a  witness  to  a  set  of  attributes  £  C  Y  \  7  in  R  if  and  only  if  x 
is  a  witness  to  £  in  Q'.  □ 

With  notation  and  construction  as  in  Lemma  55,  the  following  formulas  hold,  assuming 
7/Y: 

•  If  £  c  (Y  \  7),  then  ^g/(£)  =  f\e£  Xy  =  Vh?(£)- 

•  If  k  C  A,  then  0g/(/e)  =  \  7)  =  <M«)  \  7- 

Caution:  It  need  not  be  true  that  (/>r(k)  =  4>Q'(k)  U  7. 


Comments:  (1)  The  first  formula  holds  for  £  =  0  and  the  second  formula  holds  for  k  =  0. 
(2)  The  simplex  tests  hold:  £  G  Fq/  if  and  only  if  V’<?'(£)  7^  0>  and  n  G  'kg'  if  and  only  if 
cf)Qi(f t)  7^  0.  (3)  If  7  =  Y,  the  formulas  still  hold,  but  testing  for  membership  in  'kg/  via  the 
question  “Is  (/)qi(k)  nonempty?”  no  longer  makes  sense. 

Recall:  A  relation  R  preserves  attribute  privacy  when  the  closure  operator  f>R  oipR  is  the 
identity  on  &r  and  it  preserves  association  privacy  when  the  closure  operator  ipR  o  <Pr  is  the 
identity  on  ^ r. 

Lemma  56.  Let  R  be  a  relation  on  X  x  Y,  with  both  X  and  Y  nonempty.  Suppose  7  G  $>r. 

If  4>r°  fpR  is  the  identity  on  Tr,  then  the  corresponding  closure  operators  for  the  relations 
modeling  Lk(<I>R,  7)  and  dl($R,7)  are  also  identities. 

Technicality:  The  operators  are  formally  defined  as  self-maps  on  the  face  posets  of  the 
simplicial  complexes  mentioned  in  the  lemma,  but  we  can  extend  each  operator  to  the  empty 
simplex  and  therefore  think  of  it  as  a  self-map  on  a  simplicial  complex  viewed  as  a  collection 
of  simplices. 
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Proof.  Define  Q  as  in  Lemma  54.  That  lemma  tells  us  <kg  =  Lk(<kfl,7). 

Given  £  £  <kg,  let  r  =  £  U  7  and  calculate: 

0<3  0  V’qXO  =  4>q{^r{j ))  =  (Pr(^r(t))\  7  =  r\7  =  £. 

Define  Q7  as  in  Lemma  55.  That  lemma  tells  us  <kg/  =  dl(<k/j,7). 

Given  £  £  <kg/,  calculate: 

0<2'  0  V’Q'XO  =  =  <MlM£))\  7  =  £\7  =  ?•  □ 

Here  is  a  variation,  in  which  one  again  computes  a  link  of  attributes  but  then  considers  the 
closure  operator  on  the  dual  complex,  i.e.,  within  the  space  of  individuals: 

Lemma  57.  Let  R  be  a  relation  on  X  x  Y ,  with  both  X  and  Y  nonempty.  Suppose  7  £$_??. 
Let  Q,  a,  and  Y  be  as  in  the  construction  of  Lemma  54 ■  Assume  |cr|  >  1  and  Y  /  0. 

If  'i/jr  o  (f>R  is  the  identity  on  'It  r,  then  if q  o  cj)Q  is  the  identity  on  'kg. 

Proof.  Suppose  0  /  n  £  ’kg.  Observe  that  7  C  (/)r(k)  and  calculate: 

(4’q  0  4>q){r)  =  '4>q{4>r{k)  \  7)  =  =  «• 

Additionally, 

Oq  0  <M(0)  =  =  iPr{Y  U  7)  =  Vir(IJ^)  = 

irGcr 

n  =  n^°^)(w)  =  nw = 

irEcr  #Ecr  #Ecr 

The  last  equality  holds  since  |cr|  >  1.  In  short,  (i/jq  o  0q)(k)  =  k  for  all  k  £  \kg.  □ 

Comment:  When  Y  =  0,  we  take  ’kg  and  <kg  to  be  the  empty  simplicial  complex  {0}.  It 
is  sensible  to  say  that  £>g  o  ifq  is  the  identity  on  <kg  since  4>q{iPq{®))  =  </>g(cx)  =  0.  It  could  be 
confusing  to  say  that  i/’q  o  cf>Q  is  the  identity  on  ’kg  since  ipQ(<j>Q(®))  =  i/jq(Y)  =  Y>g(0)  =  cr, 
though  perhaps  one  could  argue  that  there  should  be  no  association  inference  in  Q  since  there 
are  no  attributes. 

Corollary  58.  Let  R  be  a  relation  on  X  x  Y ,  with  both  X  and  Y  nonempty.  Suppose  7  £  <k/j. 
Let  Q  and  Y  be  as  in  the  construction  of  Lemma  54-  Assume  Y  7^  0. 

If  R  preserves  both  attribute  and  association  privacy,  then  so  does  Q. 

Proof.  Relation  Q  preserves  attribute  privacy  by  Lemma  56.  Let  a  =  ipR^'y).  If  we  can  show 
that  |cr|  >  1,  then  Q  preserves  association  privacy  by  Lemma  57. 

Observe  that  |<r|  >0,  since  7  £  If  7)  consists  of  a  single  individual  x  £  X,  then 

7  =  {<t>R°i>R){  7)  =  <Mt)  =  Yx  =  f'U7, 
which  is  impossible  for  nonempty  Y .  □ 
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The  following  lemma  formalizes  the  intuition  that  a  set  of  attributes  7  implies  another 
attribute  y  precisely  when  the  columns  corresponding  to  7  have  nonempty  intersection  and 
that  intersection  is  a  subset  of  the  column  corresponding  to  y. 

Lemma  59.  Let  R  be  a  relation  on  X  x  Y ,  with  both  X  and  Y  nonempty. 

R  preserves  attribute  privacy  if  and  only  if  the  following  condition  is  true: 

For  all  7  6  and  all  y  G  Y,  if  1/^(7)  C  ^r{{u})  then  y  G  7. 

Proof.  I.  Suppose  there  exist  7  £  and  y  G  Y  such  that  V'-rCt)  Q  'f’Rilu})  but  y  0  7. 
Since  <fn  o  ipR  is  a  closure  operator,  y  G  (cpR  o  tpR)({yj)  and  7  C  (cPr  o  ipR.)^)-  Now  observe 
that  (4>r  o  ipR)({y})  N  (cfji  o  iPr){ 7)  by  supposition  and  because  (pR  is  inclusion-reversing. 
Consequently,  ((ProiPr)^)  must  be  a  proper  superset  of  7,  telling  us  there  is  attribute  inference. 

II.  If  there  is  attribute  inference,  then  for  some  7  G  Tr,  7  C  (0r  o  1/7?)  (7).  Pick  some 
y  G  Or  o  Vt?)(7)  \  7.  Then  1/^7  but 

V’rO)  =  ^(Wfi0W(7))  c  Or(Or°V’r)( 7)  \  7)  c  V’R({y})- 

(The  equality  holds  by  associativity  of  o  and  the  dual  version  of  Corollary  45  on  page  87. 
The  two  subset  relations  hold  by  inclusion-reversal  of  ipR.) 

(Technical  comment:  In  both  parts  above,  7  =  0  is  permissible.)  □ 

Recall  the  following  definition: 

Definition  6  (Unique  Identifiability) .  Let  R  be  a  relation  on  X  xY  and  suppose  x  G  X. 

We  say  that  x  is  uniquely  identifiable  via  relation  R  when  i/jr(Yx)  =  {x}. 

Comment:  It  is  entirely  possible  that  one  or  more  proper  subsets  7  of  Yx  already  identifies 
x,  meaning  i/jr^)  =  {x}.  Certainly  x  is  uniquely  identifiable  in  that  case.  Moreover,  the 
attributes  Yx\  7  can  be  inferred  from  7. 

Lemma  60.  Let  R  be  a  relation  on  X  xY  that  preserves  attribute  privacy.  Let  x  G  X.  Then 
no  proper  subset  of  Yx  identifies  x. 

Proof.  Suppose  for  some  iGX  and  some  7  C  Yx,  iPr(  7)  =  {x}.  A  contradiction  ensues: 

7  £  Yx  =  <Pr({x})  =  ((pR  0  V’i?)  (7)  =  7-  □ 
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We  turn  now  to  proving  the  assertions  of  Section  5,  regarding  free  faces. 

Lemma  61.  Let  R  be  a  relation  on  X  x  Y,  with  both  X  and  Y  nonempty.  If  <&r  contains  no 
free  faces,  then  R  preserves  attribute  privacy. 

Proof.  We  will  show  that  <f)R  °ipR  is  the  identity  on  &r. 

As  usual,  (</>_rO'0r)(0)  =  c j>n(X ).  We  therefore  need  to  show  that  4>r(X)  =  0.  Observe  that 
every  maximal  simplex  of  &r  contains  4>r(X),  since  any  witness  for  such  a  simplex  must  have 
all  the  attributes  in  4>r(X ).  Pick  some  maximal  simplex  i]  of  &r  and  consider  7  =  77  \  cj)R(X). 
Let  rf  be  any  maximal  simplex  of  containing  7.  Then 

T]  =  7  U  4>r(X)  C  t/U  4>r{X)  =  if. 

So  i]  =  if  by  nraxinrality.  Since  T#  has  no  free  faces,  7  cannot  be  a  proper  subset  of  ip 
meaning  4>r(X)  =  0,  as  desired. 

Now  consider  0  /  7  £  Suppose  7  is  a  proper  subset  of  ( 4>r  o  V’r)( 7)-  By  Corollary  43 
and  Lemma  47  on  pages  87  and  88,  respectively,  we  can  assume  without  loss  of  generality  that 
7  =  T]  \  {y}  for  some  maximal  i]  of  and  some  y  £  rj.  Observe  that 

v  \  {y}  =  7  S  (4>r  0  7)  c  ((j)R  o  i’R){rj)  =  ip 

so  1 1  =  (c f>R  o  V’i?)( 7)-  Now  let  ?/  be  any  maximal  simplex  of  containing  7.  Then 
7  =  ((/>R°'ipR)(l)  C  (feo^K)(i/)  =  ?/. 

(Note:  The  last  equality  in  each  of  the  lines  of  comparisons  above  follows  from  Corollary  43 
by  maximality.)  _ 

So  r]  =  rf  by  maximality.  That  says  7  is  a  free  face  of  $/?,  a  contradiction.  ^ 

The  converse  of  Lemma  61  need  not  hold  if  there  exists  an  individual  who  can  hide,  with 
attributes  that  form  a  strict  subset  of  some  other  individual’s  attributes.  However: 

Lemma  62.  Let  R  be  a  relation  on  X  x  Y,  with  both  X  and  Y  nonempty.  If  R  preserves 
attribute  privacy  and  if  every  x  G  X  is  uniquely  identifiable  via  R,  then  &r  contains  no  free 
faces. 

Proof.  Suppose  that  7  is  a  free  face  of  &r.  We  can  assume  without  loss  of  generality  that 
7  =  77  \  {y}  for  some  maximal  i]  £  &r  and  y  £  r/.  Since  a  Dowker  attribute  complex  is 
generated  by  the  rows  of  the  underlying  relation,  it  must  be  that  rj  =  Yx  for  at  least  one 
x  £  X.  By  Lemma  60,  there  is  at  least  one  x'  besides  x  in  if>R{ 7).  Then 

7  =  {<t>R  °^r)  (7)  C  4>r({x,x'})  =  YxnYx>. 

Since  we  have  assumed  that  7  is  free  and  Yx  is  maximal,  we  see  that  Yx>  must  be  a  subset 
of  Yx.  That  means  x'  is  not  uniquely  identifiable,  a  contradiction. 

(Technical  comment:  7  =  0  is  permissible  throughout  this  argument.)  □ 
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The  following  lemma  will  help  us  later  in  Appendix  E,  to  establish  the  remaining  assertions 
of  Sections  5  and  7. 

Lemma  63.  Let  R  be  a  relation  on  X  xY  such  that  |A|  =  |T|  >  1 .  If  R  has  no  blank  columns 
and  preserves  attribute  privacy,  then  every  x  G  X  is  uniquely  identifiable  via  R. 

Proof.  The  proof  is  by  induction  on  n  =  \X\  =  \Y\. 

I.  The  base  case  n  =  2  implies  that  R  is  isomorphic  to 


R 

yi  V2 

X\ 

• 

X2 

• 

(Any  other  type  of  2  x  2  relation  without  blank  columns  would  allow  for  attribute  inference.) 
Each  Xi  is  uniquely  identifiable  in  R  above. 

II.  For  the  induction  step,  assume  that,  for  some  n  >  2,  the  lemma  holds  for  all  relations 
with  X  and  Y  spaces  of  size  strictly  less  than  n.  We  need  to  establish  the  lemma  for  all 
relations  with  X  and  Y  spaces  of  size  n. 

Subclaim:  R  has  no  blank  rows. 

To  see  this,  suppose  that  Y%  =  0  for  some  x  G  X.  Let  Q  be  the  restriction  of  R  to 
X'  x  Y ,  with  X'  =  X  \  {x}.  There  is  no  significant  difference  between  R  and  Q;  in 
particular,  Q  also  preserves  attribute  privacy. 

(Perhaps  the  empty  simplex  is  slightly  tricky:  (</>q  o  t/;q)(0)  =  OxeX'^x-  If  this 
intersection  is  nonempty,  it  contains  some  y±  G  Y .  Pick  r/2  G  Y  with  y2  y\ ;  this 
is  possible  since  |E|  >  2.  Note  that  Xy2  C  X1 ,  so  y\  G  f\eX'  ^  C\xexyo  = 

{(t>R  °  V’flXOte})  =  {2/2},  a  contradiction.  So  (</>q  o  ^q)(0)  =  0.) 

Now  let  Q1  be  the  further  restriction  of  R  to  X'  x  Y1,  where  Y'  =  Y  \y,  with  y 
any  element  of  Y .  By  Lemma  56  on  page  91,  Qf  preserves  attribute  privacy.  The 
underlying  X  and  Y  spaces  of  Q '  each  have  size  n  —  1  and  Qf  has  no  blank  columns. 

The  induction  hypothesis  therefore  tells  us  that  every  individual  in  X1  is  uniquely 
identifiable  via  Q' .  Bearing  in  mind  that  x  does  not  appear  in  any  Xy,  one  sees  that 
for  each  x  G  X',  there  is  some  7  C  Y’  such  that  nye7  Xy  =  {x}.  That  intersection 
is  a  column  vector  all  of  whose  entries  are  0  (blank)  except  for  the  entry  indexed  by 
x.  Since  R  preserves  attribute  privacy  and  x  is  arbitrary  in  X',  Lemma  59  implies 
that  in  fact  Xy  =  0,  contradicting  the  assumption  that  R  has  no  blank  columns. 

Next,  pick  x  G  X.  We  will  show  that  x  is  uniquely  identifiable  via  R.  Without  loss  of 
generality,  write  R  as  in  Figure  51  (“blank”  entries  are  indicated  by  “0”s): 

Specifically,  pick  some  y  G  Y  such  that  (x,  y)  G  R.  This  is  possible  since  R  has  no  blank 
rows.  Then  decompose  X  =  X\  U  X2,  with  X\  =  Xy  and  X2  =  X  \  X\ .  Since  R  preserves 
attribute  privacy,  X2  /  0. 
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Figure  51:  Relation  R  decomposed  into  blocks  for  the  proof  of  Lemma  63. 

Let  Q  model  Lk(<k/j,y).  So  Q  is  R  restricted  to  X\  x  Y\ ,  with  Y\  =  UxeXi  Yx\{y}-  if 
Y\  ^  0,  then  Q  preserves  attribute  privacy,  by  Lemma  56,  and  Q  has  no  blank  columns. 

Now  write  Y  as  the  disjoint  union  Y  =  {y}  U  Y\  U  Y2,  with  Y2  =  Y  \  (Y\  U  {j/}). 

Observe  that  no  element  of  X2  has  attribute  y.  Observe  further  that  every  element  in  X\ 
has  attribute  y  but  has  no  attributes  in  X,  by  construction. 

Let  A  be  the  restriction  of  R  to  X2  x  Yj  and  let  B  be  the  restriction  of  R  to  X2  x  Y2. 

If  I2  7^  0j  then  B  has  no  blank  columns  and  &b  =  dl($ij,  Y\  U  {y}).  If  \Y2 \  >  2,  then  the 
blank  rows  indexed  by  X\  that  remain  after  deleting  from  R  the  columns  indexed  by  Yj  U  {j/} 
are  irrelevant  and  so  B  preserves  attribute  privacy  (by  Lemma  56  and  by  an  argument  similar 
to  that  appearing  in  the  proof  of  the  Subclaim  on  page  95). 

Let’s  look  at  some  cases: 

•  | Y2 1  >  |X2 1  =  1:  Then  any  attribute  of  I2  identifies  the  one  element  of  X2.  Since  R 
preserves  attribute  privacy,  this  implies  both  that  |  T2 1  =  1  and  that  relation  A  is  blank. 
Consequently,  every  attribute  in  Y\  implies  y  in  R.  Since  R  preserves  attribute  privacy, 
we  conclude  that  Y\  =  0.  That  means  we  are  actually  in  the  base  case,  with  n  =  2. 

•  IL2I  >  IX2 1  >  2:  By  removing  some  columns  of  B,  we  obtain  a  square  relation  to  which 
we  can  apply  the  induction  hypothesis.  That  means  every  x  G  X2  is  uniquely  identifiable 
by  the  remaining  columns.  Since  B  preserves  attribute  privacy  that  means  the  columns 
removed  must  have  been  blank,  a  contradiction. 

•  | >2 1  =  |X2 1  >  2:  We  can  apply  the  induction  hypothesis  directly  to  B.  That  again  tells 
us  that  every  x  G  X2  is  uniquely  identifiable  by  columns  of  Y2,  both  in  B  and  in  R. 
We  conclude  that  relation  A  must  be  blank  and  so  Y\  =  0,  arguing  as  above.  Thus 
IX2 1  =  |Y'2 1  =  n  —  1,  implying  | X 1 1  =  1 .  So  y  uniquely  identifies  x,  as  desired. 

•  <  | X2 1 :  This  means  |Yi|  >  |Xi|.  Additionally,  |Xi|  >  2,  as  otherwise  y  implies 
all  the  attributes  of  Y\ .  If  actually  |Yi|  >  |Xi|,  then  we  could  argue  as  above  to  see 
that  some  columns  of  Q  are  blank,  contrary  to  the  construction  of  Q.  So  we  have 
that  |X,  |  =  |Xi  |  >  2  and  the  induction  hypothesis  applies.  Consequently  x  is  uniquely 
identifiable  via  Q,  say  as  {x}  =  iJjq( 7),  for  some  7  C  Y\ .  If  we  adjoin  y.  we  get  that 

7  U  {y})  =  7)  =  {x}j  as  desired.  □ 
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Theorem  64  (Too  Many  Attributes).  Let  R  be  a  relation  on  X  x  Y  with  no  blank  columns. 
Suppose  \Y\  >  |X|  >  1.  Then  R  does  not  preserve  attribute  privacy. 

Proof.  The  proof  is  a  corollary  to  Lemma  63: 

If  \Y\  >  |X|  =  1,  then  any  one  element  of  Y  implies  all  the  others. 

Otherwise,  suppose  R  preserves  attribute  privacy.  We  have  |Y|  >  |X|  >  1,  so  we  can 
delete  some  columns  of  R  and  apply  Lemma  63  to  the  resulting  relation.  Every  element  of 
X  is  therefore  uniquely  identifiable  via  the  columns  retained.  Consequently,  either  there  is 
attribute  inference  in  R  or  the  discarded  columns  were  blank,  a  contradiction.  □ 

Comment:  One  implication  of  this  result  and  those  in  Appendix  E  is  the  old  detective  show 

mantra  “eliminate  suspects” :  Reduce  the  number  of  relevant  individuals  sufficiently,  and  some 
attribute  inference  is  assured.  This  amounts  to  moving  from  relation  R  to  a  subrelation  Q 
representing  dl('L/j,  a),  with  o  a  set  of  “eliminated  suspects”. 
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D  Inference  Hardness 

We  have  so  far  spoken  mainly  of  privacy  preservation  overall  in  a  relation.  One  can  also  focus 
on  a  single  individual: 

Definition  65  (Individual  Privacy).  Let  R  be  a  relation  on  X  x  Y  and  suppose  x  6  X. 

We  say  that  R  preserves  attribute  privacy  for  x  whenever  (4>r°  V’r)(7)  =  7  for  a U  7  7  Yx. 

We  have  seen  the  following  basic  result  within  the  proofs  of  other  lemmas: 

Lemma  66.  Let  R  be  a  relation  on  X  x  Y,  with  both  X  and  Y  nonempty.  Let  x  £  X.  Then: 

R  preserves  attribute  privacy  for  x 
if  and  only  if 

(i i>R  °  Vt?.)( 7)  =  7 ;  for  all  7  of  the  form  7  =  Yx  \  {y},  with  y  eYx. 

Proof.  I.  If  R  preserves  attribute  privacy  for  x,  then  the  condition  is  satisfied  by  definition. 

II.  Suppose  R  does  not  preserve  attribute  privacy  for  x.  Then  for  some  y  C  Yx, 
y  C  (tf>R  o  'ijjptffri).  We  know  (<f>R  o  V-’fij(Xx)  =  Yx  by  Corollary  46  on  page  87,  so  by  Lemma  47 
on  page  88  we  can  assume  that  y  =  Yx  \  {y},  for  some  y  £  Yx.  □ 

Lemma  66  tells  us  that  it  is  fairly  easy  to  check  whether  an  individual’s  attribute  privacy  is 
preserved.  One  merely  needs  to  check  whether  any  one  attribute  is  implied  by  all  the  remaining 
attributes.  That  may  be  done  quickly  since  the  maps  <f)R  and  f)R  amount  to  set  intersections. 
Harder  is  finding  a  smallest  set  of  attributes  that  implies  another  of  the  individual’s  attributes. 

Influenced  by  Lemma  59  on  page  93,  we  formulate  the  following  problem: 

Definition  67  (Minimal  Inference).  MinInf  is  the  following  decision  problem: 

Given  relation  R  on  XxY,x&X,yE  Y ,  and  k  >  0,  is  there  a  simplex  7  £  Tr  with 
7  Q  Yx  \  {y}  such  that  I7I  <  k  and  ipR( 7)  C  ifR{{y})  ? 

Lemma  68.  MinInf  is  NP-complete 

Proof.  (A)  Observe  that  the  problem  lies  in  NP:  Given  some  7,  one  can  verify  the  stated 
conditions  in  polynomial  time.  The  verifications  amount  to  set  intersection,  cardinality,  and 
subset  computations,  drawn  from  the  columns  and  one  row  of  R. 

(B)  We  will  establish  NP- hardness  by  a  reduction  from  Set  Cover.  Recall:  Given  a 
collection  of  sets  {£1, . . . ,  Sm},  Set  Cover  asks  whether  there  is  some  subcollection  of  size  at 
most  k  such  that  the  union  of  the  subcollection  is  the  overall  union  (often  called  the  universe ) . 

Given  an  instance  of  the  Set  Cover  problem,  we  define  the  following  relation: 

•  X  =  {x0}  U  U™i  <S ’i,  with  x'o  a  new  element  distinct  from  any  elements  in  the  sets  St . 

•  Y  =  {0,1,  ...,m}. 

•  R  =  ({x0}  x  Y)  U  IXi  {(®,i)  G  A  x  Y  \  x  £  X  \  Si  }. 
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In  words:  The  0th  column  of  R  is  the  singleton  set  {to}  and  the  column  of  R,  for 
i  =  1, . . . ,  m,  is  X  \  Si,  i.e. ,  the  complement  of  Si  in  the  original  set  cover  universe,  but  now 
with  .To  added.  The  row  for  To  has  entries  for  all  possible  attributes.  All  other  rows  have  no 
entry  in  column  0. 

Reduction:  Given  an  instance  of  Set  Cover,  we  transform  it  into  an  instance  of  MinInf 
using  the  relation  R  given  above  and  by  letting  x  =  xq  and  y  =  0.  The  parameter  k  is  the 
same  for  both  problems.  Observe  that  Yx  \  {y}  =  {1, . . . ,  m}. 

Observe  further  that  |A|  =  |U’=i  S}|  +  1  =  n  +  1,  \Y\  =  m  +  1,  with  n  the  number  of 
elements  in  the  set  cover  universe  and  m  the  number  of  subsets  specified  for  the  set  cover 
problem.  The  reduction  can  therefore  be  computed  in  polynomial  time. 

To  complete  the  proof,  we  will  establish  the  following: 

Claim:  The  answer  to  Set  Cover  is  “yes”  if  and  only  if  the  answer  to  MinInf  is  “yes”. 

I.  A  “yes”  answer  to  Set  Cover  means  that  there  is  some  set  of  indices  7  C  {1, . . . ,  m}, 
with  |7|  <  k  such  that  |J/e7  Sj  =  U?;=i  Therefore,  since  0^7, 

m 

V>r(7)  =  f]XJ  =  f](X\Sj)  =  X\(\J  S’)  =  a  \  (U  Si)  =  {t0}  =  Vh?({0})  =  Miv})- 

j& 7  JS7  i=l 

In  other  words,  0  /  V’r(7)  ^  ^rUv})  with  7  C  Yx  \  {y}  and  |7|  <  k ,  meaning  that  the 
answer  to  MinInf  is  “yes”  as  well. 

II.  A  “yes”  answer  to  MinInf  means  there  is  some  7  C  {1, . . . ,  to}  such  that  |7|  <  k  and 
0  +  ^R{l)  Q  4'R{{y})-  Observe  that  ipR({y})  =  ^r{{ 0})  =  {t0}  and  that 

Ml)  =  C\xi  =  n^\^)  =  -^  \  (U  Sj)- 

jg7  jg7  jg7 

The  middle  equality  holds  as  before  because  0^7. 

So  we  see  that  To  £  X  \  (Uje7  Sj)  —  {xo})  telling  us 

m 

U-Sj  =  X\{t0}  =  |J  Si. 

j& 7  i=l 

That  means  7  describes  a  set  of  indices  sought  for  by  Set  Cover,  with  |y|  <  k,  so  the 
answer  to  Set  Cover  is  also  “yes”.  □ 
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E  Privacy  Spheres 

The  aim  of  this  appendix  is  to  characterize  privacy  and  inference  in  terms  of  spheres.  Spheres 
exhibit  homogeneity,  which  is  good  for  privacy,  while  still  admitting  a  coordinate  system  for 
identifiability. 

We  first  prove  a  theorem  characterizing  individual  attribute  privacy,  then  a  generalization 
that  holds  for  arbitrary  elements  of  a  relation’s  poset,  and  finally  a  characterization  of  relations 
that  preserve  both  attribute  and  association  privacy. 

E.l  Individual  Attribute  Privacy 

We  first  need  a  lemma  as  a  tool.  Recall  also  Definitions  6  and  65  (see  pages  93  and  98). 
Lemma  69.  Let  R  be  a  relation  on  X  x  Y.  Let  be  uniquely  identifiable  via  R.  Then: 

( n  as/)\{x}=0. 

y&Yx 

Moreover,  R  preserves  attribute  privacy  for  x  if  and  only  if 

(  P|  Xy )  \  {x}  0,  for  all  7  C  Yx. 

y&i 

Proof.  The  first  statement  follows  from  the  definition  of  unique  identifiability:  PeY*.  xv  ~ 

V’ r(yx )  =  M- 

For  the  second  statement: 

I.  Assume  that  R  preserves  attribute  privacy  for  x.  Let  7  C  Yx.  If  ( P)  e  Xy)  \  {x}  =  0, 
then  V’i?( 7)  =  n,/e7  Xv  =  {x}>  since  x  £  Xy  whenever  y  €  7  C  Yx  (when  7  =  0,  the  vacuous 
intersection  is  all  of  X,  containing  x).  That  says  a  proper  subset  of  Yx  identifies  x,  leading  to 
a  contradiction,  as  in  the  proof  of  Lemma  60  on  page  93. 

II.  Assume  (  f),je7  Xy)  \  {x}  /  0  for  all  proper  subsets  7  of  Yx.  If  R  fails  to  preserve 
attribute  privacy  for  x,  then  by  Lemma  66  there  is  some  7  of  the  form  Yx  \  {y},  with 
y  G  Yx,  such  that  7  C  (fi>R  o  ifR)( 7)  =  Yx.  Applying  ipR  to  both  sides  of  that  last 
equality  gives  ^(7)  =  fin{Yx)  =  {x},  by  unique  identifiability.  That  is  a  contradiction, 

since  Vh?(7)  =  fl y&1Xy.  □ 

We  now  address  our  characterization  of  individual  privacy,  proving  a  theorem  stated  previously: 

Theorem  9  (Individual  Attribute  Privacy).  Let  R  be  a  relation  on  X  x  Y,  with  |X|  >  1. 
Suppose  x  6  X  is  uniquely  identifiable  via  R.  Let  Q  be  the  relation  modeling  Lk('h/j,x). 

Then  the  following  three  conditions  are  equivalent: 

(a)  R  preserves  attribute  privacy  for  x, 

(b)  Lk(^^,x)  ~  §fc~2,  with  k  =  \YX\, 

(c)  =  d(Yx). 
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Proof.  The  hypotheses  ensure  that  Yx  /  0  (and  so  also  Y  /  0).  They  also  ensure  that  x  is  a 
vertex  of  'L/j,  so  the  link  is  not  void.  It  could  be  the  empty  complex  {0},  of  course. 

Observe  that  Q  is  the  restriction  of  R  to  X  x  Yx,  with  X  =  |J yeyx  \  {tc}. 

If  X  =  0,  then,  reasoning  as  in  the  proof  of  Lemma  54  on  page  90,  we  see  that 
Lk(\kij,x)  =  {0}  =  S-1.  Furthermore,  x  does  not  share  any  of  its  attributes  with  any 
other  individuals  in  X.  By  convention,  =  {0}  as  well.  If  k  =  \YX\  =  1,  meaning 
x  has  a  single  attribute,  then  R  preserves  attribute  privacy  for  x  since  |X|  >  1.  Also, 
=  S_1  =  {0}  =  d(Yx).  So  conditions  (a),  (b),  (c)  all  hold.  If  k  =  1 5^ |  >  2,  then 
any  one  attribute  of  Yx  implies  all  the  others,  so  condition  (a)  does  not  hold.  Moreover, 
conditions  (b)  and  (c)  also  do  not  hold.  In  short,  the  theorem  holds  when  X  =  0. 

We  now  assume  that  L/0.  We  then  know  that  Lk(4'/j,x)  =  'Lq  ~  by  a  dual  version 
of  Lemma  54  and  by  Dowker  duality.  Definitionally,  d{Yx )  ~  §fc~2,  with  k  =  \YX\  >  0.  We 
therefore  see  that  (c)  implies  (b).  To  see  that  (b)  implies  (c),  observe  that  the  underlying  vertex 
set  of  is  Yx,  so  Tq  ~  Sk~2  means  Tq  =  d(Yx),  since  no  proper  subset  of  a  sphere  can  be 
homotopic  to  that  same  sphere.  To  prove  the  theorem  we  therefore  only  need  to  establish  that 
conditions  (a)  and  (c)  are  equivalent. 

Recall  the  formulas  relating  (f>Q  and  from  page  91  and  dualize  them  here.  We  see  that: 

4>q(x)  =  iMx)\M  -  (n  Xyj  \  {tc},  for  all  0  %  C  Yx. 

y£x 

I.  Assume  that  R  preserves  attribute  privacy  for  x.  By  Lemma  69  and  the  formula  above 
we  see  that  V’q(x)  /  0  for  all  nonempty  proper  subsets  x  of  Yx  and  that  iPq(Yx)  =  0,  since  x 
is  uniquely  identifiable.  Consequently,  contains  every  nonempty  proper  subset  of  Yx  as  a 
simplex,  but  does  not  contain  Yx.  (Also,  contains  the  empty  simplex  since  the  complex  is 
not  void.)  Thus  =  d(Yx ). 

II.  Assume  that  =  d(Yx).  Then  V'qCx)  /  0  f°r  every  nonempty  proper  subset  x  of  Yx. 
By  the  formula  above,  (  f\ex  ^y)  \  {x}  /  0)  f°r  each  such  x-  Now  suppose  x  =  0  S  Yx- 
Then: 

0  +  X  =  ^q(0)  C  A\{x}  =  (f|Ay)\{x}. 

ye<6 

So  we  see  that  (f)ye7^)  \  ^  0  for  every  proper  subset  of  Yx ,  implying  that  R 

preserves  attribute  privacy  for  x ,  by  Lemma  69.  □ 

Comment:  It  is  impossible  to  satisfy  the  following  three  conditions  simultaneously: 

(1)  x  is  uniquely  identifiable,  (2)  \YX\  =  1,  (3)  X  /  0. 
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E.2  Group  Attribute  Privacy 

We  now  generalize  the  previous  theorem  to  arbitrary  elements  (<r,  7)  of  the  doubly-labeled 
poset  Pr  associated  with  a  relation  R.  We  stated  the  generalized  theorem  previously  in  the 
report,  as  Theorem  10,  and  replicate  that  below.  One  may  view  this  generalized  theorem  as  a 
characterization  of  the  conditions  under  which  a  group  a  of  individuals  has  its  attribute  privacy 
preserved,  as  a  whole  not  necessarily  individually.  Theorem  9  is  a  special  case  of  Theorem  10, 
with  the  “group”  a  single  individual  x,  since  ({x},  Yx)  £  Pr  whenever  x  is  uniquely  identifiable 
via  R. 

Theorem  10  (Group  Attribute  Privacy).  Let  R  be  a  relation  on  X  x  Y . 

Suppose  (a,  7)  £  Pr,  with  a  /  X.  Let  Q  be  the  relation  modeling  Lk(\H#,<7). 

Then  the  following  three  conditions  are  equivalent: 

(a)  (cpR  oifR)(y')  =  7^,  for  every  subset  7^  of  7, 

(b)  Lk(^R,(j)  ~  §k~2,  with  k  =  [7I, 

(c)  <S>q  =  5(7). 


Proof.  Reminder:  Since  (cr,7)  £  Pr,  0  /  a  £  'I' r,  0  /  7  £  4>r(ct)  =  7,  and  ipR( 7)  =  a. 

Thus  also  (1 j>R  o  'iPr)( 7)  =  7,  meaning  we  can  focus  on  proper  subsets  of  7  for  part  (a). 

Recall  also  that  Q  is  the  restriction  of  R  to  X  x  7,  with  X  =  Uj,£7  Xy  \  or. 

If  X  =  0,  then  Lk('I'R,  a)  =  {0}  =  8_1.  By  convention,  =  {0}  as  well.  If  k  =  |y|  =  1, 
then  § k~2  =  §_1  =  {0}  =  d(y).  The  only  proper  subset  of  7  in  this  case  is  7'  =  0,  and 
(<I>R  0  Yir)(0)  =  (/>r(X)  =  0.  (Reason:  If  y  £  (j>R{X),  then  y  £  7,  so  7  =  {y},  implying  a  =  X, 
which  is  disallowed.)  Thus  conditions  (a),  (b),  (c)  all  hold.  If  k  =  |y|  >  2,  then  conditions 
(b)  and  (c)  cannot  hold.  Also,  condition  (a)  does  not  hold  since  (4>r  o  V’i?)({y})  =  7  for  each 
y  £  7,  bearing  in  mind  that  X  =  0  means  Xy  =  a  for  each  y  £  7.  In  short,  the  theorem  holds 
when  X  =  0. 

We  now  assume  that  X/0.  As  in  the  proof  of  Theorem  9,  we  see  readily  that  conditions 
(b)  and  (c)  are  equivalent,  so  we  will  prove  that  conditions  (a)  and  (c)  are  equivalent.  And,  as 
in  the  previous  proof,  dualizing  a  formula  from  page  91  gives  this  formula: 

V'q(x)  =  iPr(x)  \  or,  for  all  0  /  y  C  7. 

I.  Assume  that  ((f) r  o  i/)r)( y')  =  7',  for  every  subset  y’  of  7. 

We  will  establish  that  Tq  contains  all  proper  subsets  of  7  but  not  7,  telling  us  $<3  =  £>(7). 

Since  ‘hQ  is  not  void,  it  contains  the  empty  simplex. 

Pick  some  0  /  7'  C  7.  Since  (<f)R  o  if)R)( y')  =  7',  ipRi'Y)  2  a ■ 

The  formula  above  therefore  says  V’q(7/)  /  telling  us  7'  £  <I>q. 

Similarly,  ifQ(  7)  =  if)R(  7)  \  a  =  cr\a  =  0,  so  70  $Q. 
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II.  Assume  that  Tq  =  9(7). 

Recall  that  k  =  I7I  >  0.  We  look  at  two  cases  based  on  the  value  of  k: 

k  =  1:  In  this  case,  7  =  {y},  for  some  y  G  Y,  so  0  =  Xy  and  X  =  0,  which  we  discussed 
above. 

k  >  1:  Suppose,  for  the  sake  of  contradiction,  that  7'  C  (^0^(7'),  for  some  7'  C  7.  By 
Lemma  47  on  page  88,  we  can  assume  7'  =  7  \  {y},  for  some  y  G  7.  Consequently, 
((f> jl  o  Vtj)( 7')  =  7,  which  implies  Yr( 7O  =  cr.  The  formula  above  then  says 
V’qCtO  =  0)  whereas  the  fact  that  7'  G  Tq  means  Yq( 7')  /  0,  a  contradiction. 

The  following  lemma,  previously  stated  on  page  29,  relates  privacy  preservation  in  a  link 
to  privacy  preservation  in  the  encompassing  relation. 

Lemma  11  (Interpreting  Local  Operators).  Let  R  be  a  relation  on  X  x  Y .  Suppose 
(a,  7)  G  Pr,  with  o  /  X.  Let  Q  be  the  relation  on  X  x  7  that  models  Lk(\k^,  cr)  and 
suppose  1/0. 

Then,  for  every  7'  C  7;  (%)  If  7'  0  t/ien  Yr(Y)  =  cr, 

(ii)  If  7'  G  4>q,  then  Yr(Y)  2  cr. 

Moreover,  in  this  case: 

If  {4>q  0  Yq)(0)  =  0-  then  ((j)R  o  Yr)(0)  =  0. 

If  i  Y  0,  t/ien  ((j)Q  o  Yq)(Y)  =  Or  0  YOKY)- 

Proof.  Observe  that  for  every  7'  C  7,  one  has  7'  G  and  Yr( 7')  A  YrO)  =  <r. 

By  the  formula  on  page  91  dualized,  if  0  /  7'  C  7,  then  i/;q( 7')  =  ipRil')  \  o. 

(i)  Suppose  7'  0  <I>q.  Then  7'  /  0  since  0  G  4>q.  Also,  Yq( 7')  =  0,  so  by  the  formula 
above,  Y>ii(Y)  =  cr. 

(ii)  Suppose  7'  G  4>q.  If  7'  =  0,  then  Yr(0)  =  A  D  cr,  by  hypothesis.  If  7'  /  0,  then 
YqCtO  /  0)  so  again  by  the  formula  above,  Yr( 7')  A  cr. 

Turning  to  the  “Moreover” : 

If  y  G  ((//?  o  V;i?)(0),  then  y  is  an  attribute  for  all  individuals  in  X,  so  y  G  7  and 
y  G  0Qpf)  =  Oq  o  Yq)(0). 

Let  0  /  Y  G  4>q.  By  the  formula  on  page  90  dualized,  if  cc  C  A,  then  4>q(k)  =  (Pr(k  U  a). 
Therefore:  Oq  o  Yq)( Y)  =  4>q{$r{ Y)  \  0  =  Or  0  Yr)(Y)-  □ 

Comment:  Also,  (^o  Yq)(0)  =  «/>q(X)  =  <^(U<,e7X2/)  =  f\e70R  0  Yr)(M)- 
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E.3  Preserving  Attribute  and  Association  Privacy 

In  this  subsection,  we  are  interested  in  understanding  relations  that  preserve  both  attribute 
and  association  privacy.  We  will  discover  that  this  requirement  is  severely  limiting.  As  one  can 
already  see  from  Theorem  64  on  page  97,  if  R  is  a  nonvoid  tight  relation  onlxh  that  preserves 
both  attribute  and  association  privacy,  then  |A|  =  |y|  =  n.  What  are  the  possibilities? 

n  =  0:  Not  possible;  this  is  the  void  relation. 

n  =  1:  Not  possible;  such  a  relation  does  not  preserve  privacy;  one  can  infer  the  single  individual 
or  single  attribute  from  nothing. 

n  =  2:  As  we  have  seen  before,  such  a  relation  must  be  isomorphic  to  the  following  relation: 


R 

y  i  2/2 

X\ 

• 

X2 

• 

Then  both  and  are  instances  of  the  0-sphere  8°. 
n  >  3:  Now  there  are  several  possibilities: 

—  The  relation  could  be  isomorphic  to  a  cyclic  staircase  relation : 


R 

2/1 

2/2 

Vn—l 

2 In 

Xl 

• 

• 

X2 

• 

• 

• 

Xn—  1 

• 

• 

Xn 

• 

• 

Then  both  4/r  and  are  homotopic  to  the  1-sphere  S1.  Each  is  simply  a  linear 
cycle  of  edges,  with  vertices  in  one  complex  dualizing  to  edges  in  the  other. 

—  The  relation  could  be  isomorphic  to  a  spherical  boundary  relation  in  which  every 
entry  is  present  except  that  a  diagonal  is  blank.  For  example,  in  the  following 
relation  all  entries  are  present  except  those  for  (ati,yn-i+ 1),  i  =  1, . . .  ,n: 


R 

2/1 

2/2 

2/n-l 

2 In 

Xl 

• 

• 

• 

• 

X-2 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

Xn—  1 

• 

• 

• 

• 

Xn 

• 

• 

• 

• 

Then  'k r  and  &r  are  each  boundary  complexes,  namely  \I/r  =  d(X)  and  .  =  d(Y). 
Thus  both  are  homotopic  to  the  (n  —  2)-sphere  §n~2. 
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—  Finally,  R  could  have  multiple  components,  each  of  which  is  one  of  the  following:  A 
singleton,  a  cyclic  staircase  relation,  or  a  spherical  boundary  relation,  all  as  above. 
(Observe  that  even  though  a  1  x  1  relation  in  and  of  itself  preserves  no  privacy,  a 
relation  can  preserve  privacy  over  a  1  x  1  subrelation  when  that  subrelation  is  one 
of  several  components.) 

(Comment:  the  staircase  and  spherical  relations  are  isomorphic  when  n  =  3.) 

The  aim  of  this  subsection  is  to  prove  that  these  are  the  only  possibilities. 


Lemma  70.  Let  R  be  a  connected  tight  relation  on  X  x  Y ,  with  |  A|  =  |T|  >  3,  that  preserves 
both  attribute  and  association  privacy. 

Let  x  £  X  and  define  Q  to  be  the  relation  on  X  x  Yx  that  models  Lk(*F/j,  x). 

Then  dfg  =  d(X)  and  <Fq  =  d(Yx),  with  |A|  =  \YX\. 

Proof.  Observe  that  Yx  /  0  since  R  is  tight.  Recall  that  X  =  {JyeYx-^-y  \  ix}i  which  is 
nonempty  since  R  is  connected  and  X  contains  not  just  x. 

By  Lemma  63  on  page  95,  x  is  uniquely  identifiable  via  R,  so  Theorem  9  on  page  100  says 
that  ~  and  <Fq  =  d(Yx),  with  k  =  \YX\.  If  we  can  show  that  \X\  =  k,  then  we  can 
conclude  that  vFq  =  d(X). 

The  vertices  of  \Fq  generate  the  maximal  simplices  of  <Fq.  In  particular,  there  exist 
x\, ...  ,Xk  £  X  such  that  Y \ , . . . ,  Yk  are  the  maximal  simplices  of  <Fq,  with  Y j  =  Yx.  n  Yx,  and 
\Yi\  =  k  —  1,  for  i  =  1, ...  ,k. 

Let  x  £  X.  Then  Yx  n  Yx  C  Yj  C  Yx.,  for  some  i  £  {1, . . . ,  k}. 

That  says  0  /  ipR({x,x})  C  ifR({xi}). 

Since  R  preserves  association  privacy,  the  dualization  of  Lemma  59  on  page  93  implies 
x  =  Xi .  Thus  | AC |  =  k.  □ 

Comment:  Where  did  we  use  the  assumption  that  each  of  X  and  Y  has  at  least  three 
elements?  In  fact,  for  the  proof  it  is  enough  to  assume  that  \X\  =  |T|  >  2.  However,  there  is 
no  connected  tight  relation  that  preserves  privacy  when  \X\  =  |Yj  =  2. 

Corollary  71.  Let  R  be  a  connected  tight  relation  on  X  xY ,  with  |X|  =  \Y\,  that  preserves 
both  attribute  and  association  privacy. 

Let  y  £  Y  and  suppose  >  4. 

Then  Lk(4>R,y)  is  not  a  linear  cycle.  (In  other  words,  the  relation  Q  that  models 
Lk(4>R,y)  is  not  isomorphic  to  a  staircase  relation.) 

Proof.  Arguing  as  in  the  proof  of  Lemma  70,  now  in  dual  form,  we  see  that  Lk(<F^,  y )  ~  §>k~2, 
with  k  =  \Xy\.  Since  k  —  2  >  2,  Lk(<F^,y)  is  not  a  linear  cycle.  □ 

Corollary  72.  Let  R  be  a  connected  tight  relation  on  Xx  Y,  with  \ X\  =  |T|  >  3,  that  preserves 
both  attribute  and  association  privacy. 

Suppose  {x,x'},  with  x  /  x' ,  is  an  edge  (1-simplex)  in  'F R. 

Then  \YX\  =  \YX>\. 
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Proof.  Let  A  =  \  YX\  and  k'  =  \Yx/\. 

Observe  that  x'  is  a  vertex  of  x)  and  x  is  a  vertex  of  Lk(\Hft,  x'). 

By  the  proof  of  Lemma  70,  each  of  x'  and  x  generates  a  maximal  simplex  in  the  attribute 
complex  associated  with  the  other’s  link.  That  simplex  is  Yx  0  Yx>  in  both  complexes. 

So  k  —  1  =  |  Yx  0  Yxi  |  =  k!  —  1,  hence  k  =  k' .  □ 

Corollary  73.  Let  R  be  a  connected  tight,  relation  on  X  x  Y,  with  |X|  =  |Y|  >  3,  that  preserves 
both  attribute  and  association  privacy. 

Then  all  rows  and  columns  have  the  same  number  of  nonblank  entries. 

Proof.  By  Lemma  50  on  page  88  and  Corollary  72  above,  all  rows  have  the  same  number  kr  of 
nonblank  entries.  Dualizing,  one  sees  that  all  columns  have  the  same  number  kc  of  nonblank 
entries.  We  claim  that  kc  =  kr.  This  assertion  follows  from  Lemma  70  and  its  proof  as  follows: 

Pick  some  x  €  X  and  let  Q  be  the  relation  modeling  Lk(\k/j,  x).  By  Lemma  70,  'Lq  and 
Tq  are  each  boundary  complexes,  with  Ay  =  \YX\  vertices.  Moreover,  each  element  y  6  Yx 
generates  a  maximal  simplex  Xy  III  in  H/q,  which  must  have  size  Ay  —  1.  The  column  Xy 
contains  one  additional  element,  namely  x.  So  Ay  =  \Xy\  =  (Ay  —  1)  +  1  =  Ay.  □ 

Theorem  74  (Privacy  as  Sphere).  Let  R  be  a  nonvoid  connected  tight  relation  on  X  xY  that 
preserves  both  attribute  and  association  privacy. 

Then  |X|  =  \Y\  >  3  and  R  is  isomorphic  to  either  a  cyclic  staircase  relation  or  a  spherical 
boundary  relation  (each  described  on  page  104). 

Proof.  As  we  commented  previously,  Theorem  64  implies  that  |X|  =  |Y|  =  n.  Connectedness 
further  means  that  n  >  3. 

By  Corollary  73,  all  rows  and  columns  in  R  have  the  same  number  of  nonblank  entries.  In 
other  words,  \Xy\  =  \YX\  =  k,  for  all  x  G  X  and  all  y  €  Y,  for  some  fixed  A.  By  connectedness, 
A  >  2. 

By  Lemma  63  on  page  95,  each  x  €  X  is  uniquely  identifiable  via  R.  Dualized,  each  y  E  Y 
is  uniquely  identifiable  via  R  as  well. 

If  A  =  2,  then  df/j  and  contain  vertices  and  edges  but  no  higher-dimensional  simplices. 
By  duality,  each  vertex  therefore  has  at  most  two  incident  edges.  By  unique  identifiability, 
each  vertex  has  exactly  two  incident  edges.  Thus,  by  connectedness,  each  complex  is  a  linear 
cycle.  So  R  is  isomorphic  to  a  staircase  relation. 

Now  assume  that  A  >  3. 

Pick  a  y  €  Y  and  consider  the  decomposition  of  Figure  52,  similar  to  the  one  we  saw  in 
the  proof  of  Lemma  63.  (We  indicate  blank  entries  either  by  blanks  or  by  explicit  “0”s.) 

Let  X\  =  Xy  and  write  X  =  X\  U  X2  with  X2  =  X  \  X\.  X\  7^  0  since  every  column  of  R 
has  A  nonblank  entries  and  X2  7^  0  since  R  preserves  attribute  privacy. 

Let  Q  model  Lk(4>7j,  y).  So  Q  is  R  restricted  to  X\  x  L,  with  Y\  =  IJ^eXi  \  {17}  •  b  /  0 
because  every  row  of  R  has  A  nonblank  entries.  In  particular,  there  are  exactly  A  —  1  entries 
in  each  row  of  Q,  so  at  least  two  entries  in  each  row. 

Now  write  Y  as  the  disjoint  union  Y  =  {y}  U  Y\  Ul),  with  Y2  =  Y  \  (Yi  U  {?/}).  Observe 
that  every  element  in  X\  has  attribute  y  but  has  no  attributes  in  Y2,  by  construction. 

By  the  dual  to  Lemma  70,  we  know  that  =  <9(Ai)  and  =  <9(Yi),  with  A  =  |Xi|  =  |Yi|. 
Therefore,  for  each  each  y  €  Yi,  column  Xy  of  R  has  A  —  1  entries  that  lie  in  X\  and  one  entry 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


Final  Report,  AFOSR  Award  FA9550-14-1-0012 


107 


Figure  52:  Relation  R  decomposed  into  blocks  for  the  proof  of  Theorem  74. 


that  lies  in  X^.  We  claim  that  the  X2  entry  is  the  same  across  all  columns  Xy  as  y  varies 
over  Y] .  For  otherwise,  at  least  two  such  columns  would  have  an  intersection  (nonempty,  since 
k  —  2  >  1)  contained  wholly  within  Xy,  implying  that  R  permits  attribute  inference  after  all, 
by  the  proof  of  Lemma  59  on  page  93.  Call  that  common  element  x.  Observe  that  Yx  =  Y\ 
since  every  row  of  R  has  exactly  k  elements.  Consequently,  the  block  diagram  for  R  becomes 
as  in  Figure  53. 


Figure  53:  Relation  R  decomposed  further. 


Observe  that  no  element  of  X\  U  {x}  has  any  attributes  in  Y2  and  that  no  element  of 
X-i  \  {x}  has  any  attributes  in  Y\  U  {y},  by  the  row  and  column  cardinality  constraints.  That 
means  relation  C,  which  is  the  restriction  of  R  to  (X2  \  {x})  x  Y2  is  disconnected  from  the  rest 
of  R,  if  C  were  to  exist.  We  conclude  that  Y2  =  0  and  that  X2  =  {x}.  Thus,  finally,  R  must 
decompose  as  in  Figure  54.  As  we  have  seen,  Q  is  nearly  a  full  relation,  missing  a  diagonal. 
We  now  see  that  R  is  also  nearly  a  full  relation,  missing  a  diagonal.  Thus  HIr  =  d{X)  and 
‘h/?  =  d(Y),  meaning  R  is  a  spherical  boundary  relation,  as  claimed.  n 
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Preserving  Attribute  and  Association  Privacy 


Figure  54:  Relation  R  decomposes  diagonally. 


Corollary  75.  Let  R  be  a  nonvoid  tight  relation  that  preserves  both  attribute  and  association 
privacy.  Decompose  R  into  its  connected  components  as  R  =  R\  U  •  •  •  U  Ri,  with  Ri  a  relation 
on  Xi  x  Yj,  as  per  the  proof  of  Lemma  51  on  page  89.  Then  Ri  is  either  a  singleton  or  a 
staircase  relation  or  a  spherical  boundary  relation  and  \Xi\  =  \Yf\,  i  =  1, . . . 

Comment:  When  £  =  2  and  each  of  R\  and  R2  is  a  singleton,  then  the  Dowker  complexes 
of  R  itself,  and  are  each  an  instance  of  §°. 

Proof.  Consider  some  Ri. 

Suppose  that  Xj  6  Then  some  attribute  y  G  Yi  is  shared  by  all  individuals  in  Xi. 

If  there  were  any  other  attributes  in  Yi,  then  each  of  those  would  individually  imply  y  in  R. 
Since  R  preserves  attribute  privacy,  |Yi|  =  1.  Consequently,  since  R  also  preserves  association 
privacy,  \X,\  =  1,  so  Ri  is  a  singleton. 

If  Ri  is  not  a  singleton,  then  Xt  0  and  similarly  Yi  0  4>_r.. 

Consequently,  Lemma  51  and  Corollary  53  on  page  89  tell  us  that  Ri  is  a  nonvoid  connected 
tight  relation  that  preserves  both  attribute  and  association  privacy.  Theorem  74  completes  the 
proof.  □ 
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F  Poset  Chains 

Recall  Definition  12,  on  page  38,  of  the  Galois  lattice  associated  with  a  relation  R ,  and 
Definition  13,  on  page  41,  defining  informative  attribute  release  sequences.  In  this  appendix 
we  will  explore  connections  between  these  two  concepts. 

F.l  Maximal  Chains  and  Informative  Attribute  Release  Sequences 

Let  R  be  a  nonvoid  relation  on  X  x  Y.  Suppose  {(07.,, 7*.)  <  •••  <  (01,71)  <  (00,70)}  is  a 
maximal  chain  in  P^.  Then,  for  1  <  i  <  k,  <7%  C  0,;_i  and  7 %  D  7i-i- 

Also,  00  =  A  and  7&  =  Y.  Note  that  70  =  4>r{X)  and  0*,  =  iPr(Y).  Consequently,  70  /  0 
if  and  only  if  X  £  \k r,  and  0^  7^  0  if  and  only  if  Y  G  &r. 

We  sometimes  speak  of  a  maximal  chain  at  and  above  (0,7),  by  which  we  mean  a  chain 
{(0, 7)  <  •  •  •  <  (01,71)  <  (00,70)}  in  P^  that  is  maximal  among  all  such  chains.  Such  a  chain 
is  a  prefix  of  a  full  maximal  chain  in  P^  (“prefix”  with  respect  to  our  subscript  ordering,  which 
starts  at  the  top  of  a  poset  and  moves  downward). 

Lemma  20  (Informative  Attributes  from  Maximal  Chains).  Let  R  be  a  relation  on  X  x  Y , 
with  both  X  and  Y  nonempty.  Suppose  {{(Tkilk)  <  •  •  •  <  (01,71)  <  (00,70)},  with  k  >1,  is  a 
maximal  chain  in  P^. 

Define  y\, ...  ,yf.  by  selecting  some  yi  6  7*  \  7«~i  ,  for  each  i  =  1, . . . ,  k. 

Then  y\,  .  .  .  ,  yi,  is  an  informative  attribute  release  sequence  for  R. 

Moreover,  (4>R  o  ipR)({yi, ...,  yi})  =  7*  for  each  i  =  0, 1, . . . ,  k. 

Proof.  Establishing  the  “Moreover”  also  establishes  the  “iars”  assertion. 

The  proof  is  by  induction  on  i. 

For  the  base  case,  i  =  0  and  we  need  to  show  that  (cpR  o  ^)(0)  =  70. 

Calculating,  (cf>R  o  t/qj)(0)  =  4>r(X)  =  70,  by  our  earlier  comments  about  maximal  chains. 
For  the  induction  step,  we  assume  that,  for  some  1  <  i  <  k,  the  assertion  holds  for  indices 
smaller  than  i  and  we  need  to  show  the  assertion  holds  for  i.  First,  observe: 

tfR({yi,---,yi})  =  ^R({yi,---,yi~i})  n  xVi  =  VtR(7i-i)  n  xVi  =  u  {yi}). 

(The  middle  equality  follows  from  the  induction  hypothesis  and  a  dual  version  of 
Corollary  45  from  page  87,  specifically  because  (f>R  o  V’r)({?/i,  •  •  • , 2/*-i})  =  7*-i  and 
ipRofiRO  ipR  =  fiR.) 

Since  7*_i  C  7j_i  U  {yj  C  7*, 

7i-l  =  ((t>R0f’R)(li-l)  £  {<t>R 0  U  {yi})  c  (Proi/jr)^)  = 

By  maximality  of  the  original  chain  and  the  nature  of  elements  in  Pi  we  see  that 
{<t>R  0  V’fl)(7i~i  U  {yi})  =  7 i,  so  (cj)R  o  ipR)({yi, ...,  yt})  =  (<Pr  o  ^)(7i-i  u  {Vi})  =  7 i-  □ 
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Here  is  a  partial  converse: 

Lemma  21  (Chains  from  Informative  Attributes).  Let  R  be  a  relation  on  X  x  Y,  with  both  X 
and  Y  nonempty.  Suppose  y±, ...  ,yk  is  an  informative  attribute  release  sequence  for  R,  with 
k  >  1. 

Let'yi  =  (4>RO'ifR)({yi,...,yi})  and  =  ipni  li),  for  i  =  1, . . .  ,k. 

Let  70  =  c Pr(X ).  Then  {(< 7k,lk )  <  ’  •  <  (04,71)  <  (AC,  70)}  is  a  chain  in  P£. 

Comment:  The  resulting  chain  need  not  be  maximal. 

Proof.  Observe  that  each  (cij,  7^)  G  P^  by  construction,  so  we  need  to  establish  the  total 

ordering.  Let’s  define  <7o  =  X.  We  need  to  show  that  cq  C  cjj_  1,  for  each  i  =  1, _ rk. 

Since  {yi,  ...,yi}  A  {yl: ...,  1},  we  see  that  cq  C  If  cq  =  CTj_i,  then  also  7*  =  7*_i, 

contradicting  the  fact  that  y*  G  7$  \  7^-1  (which  is  true  by  the  nature  of  informative  attribute 
release  sequences).  □ 

As  a  corollary  to  Lemmas  20  and  21,  one  sees  that  every  informative  attribute  release 
sequence  (iars)  for  R  is  a  subsequence  of  an  iars  derived  from  a  maximal  chain  in  Pr- 
(Technically,  one  needs  to  show  that  any  nonempty  subsequence  of  an  iars  is  itself  an  iars. 
And  one  needs  to  show  that  extending  any  chain  obtained  via  Lemma  21  to  a  maximal  chain 
retains  the  original  iars  as  a  subsequence  of  one  subsequently  obtainable  via  Lemma  20.  All 
that  is  straightforward.) 

F.2  Chains  and  Links 

We  are  interested  in  understanding  how  chains  and  informative  attribute  release  sequences 
behave  as  one  passes  to  links.  (Small  caution:  whereas  we  were  looking  at  chains  in  Pr  before, 
we  focus  here  on  Pr  (and  Pq)) 

Lemma  76  (Chains  in  Links).  Let  R  be  a  relation  on  X  x  Y ,  with  both  X  and  Y  nonempty 
and  suppose  (7,7)  G  Pr.  Let  Q  be  the  relation  modeling  Lk('k/j,  a).  Then 

PQ  =  {(o'V,  V)  |  (W7)  <  <V,7')  €  -Pr}- 


Comments: 

•  Q  is  the  restriction  of  R  to  X  x  7,  with  X  =  Xy  \  a. 

•  Pq  could  be  empty.  This  occurs  precisely  when  (u,  7)  is  a  maximal  element  of  Pr,  which 
occurs  precisely  when  Lk('k/j,cr)  =  {0}. 

•  If  a  =  X,  then  Lk('kjR,u)  =  {0}  and  so  Pq  =  0,  given  Definition  8  on  page  27.  Note 
however  that  Definition  18  of  Q(a,  7)  on  page  43  would  mean  that  Pq (CTi7)  is  degenerate. 

•  Pq  never  contains  the  element  0q  =  (0,7)  of  Pq.  That  element  corresponds  to  (7,7) 
in  Pr,  consistent  with  the  idea  of  Lemma  11  on  page  29  that  one  has  “localized  (to)  a 
upon  observing  7”.  (See  also  Definition  15  on  page  42.) 
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•  Pq  could  contain  the  element  1  q  =  ( X ,  x)  of  Pq,  for  some  7.  That  happens  precisely 
when  X/0  and  all  individuals  in  X  share  an  attribute  of  7,  in  which  case  x  7^  0. 

Proof.  The  proof  relies  on  dual  versions  of  the  formulas  on  pages  90-91. 

I.  Suppose  (k,  rj)  £  Pq.  So  k  /  0  and  r/  /  0.  Also,  'Lq  =  Lk(\l/R,  cr),  so  k  fl  o  =  0  and 

kU(t  £  'Pr.  Let  o'  =  kUo.  So  o  C  o' .  We  can  take  7'  to  be  7  since  7  =  4>q(k)  =  frfo').  Note 

that  7')  =  tpQ(rj)  U  it  =  k  U  o  =  o'.  We  have  shown  that  (V,  7')  £  Pr  and  (cr,  7)  <  (o' ,  7'). 

II.  Suppose  (<r/,7/)  £  Pr  and  ((7,7)  <  (o'  So  o  C  o'  and  7  A  7'.  Let  k  =  o'  \o.  Note 
that  k  /  0  and  7'  /  0.  Moreover,  k  £  Lk(\kR,  o),  so  X  /  0. 

Verifying  correspondence:  4>q(k)  =  4>r(o')  =  7'  and  V’qCtO  =  V’rCtO  \  o  =  o'  \  o  =  k. 

We  have  shown  that  (V  \  a,  7')  £  Pq.  □ 

Corollary  77  (Order  Preservation).  Let  R  and  Q  be  as  in  Lemma  76,  with  (o,  7)  £  Pr. 

Then  (o,  7)  <  (07,71)  <  (<72,72)  Pr  if  and  only  if  (01  \  o,  71)  <  (<r2  \  o,  72)  in  Pq. 

Proof.  By  Lemma  76  and  because: 

(a)  <7  C  o\  C  ci2  implies  0  /  <7i  \  <7  C  cr2  \  cr  5 

(b)  0  /  ki  C  k2  implies  cr  C  («7  U  o)  C  (k2  U  <7).  □ 

Corollary  78  (Maximal  Chain  Preservation).  Let  R  and  Q  be  as  in  Lemma  76,  with 
(<7,7)  £  Pr.  Then  {(o', 7)  <  (■ o k,Tk)  <  ■  ■  ■  <  (07,71)}  is  a  maximal  chain  at  and  above 
(o,  7)  in  Pr  if  and  only  if  {(07,  \  o,  7 k)  <  ■  ■  ■  <  (o\  \  o,  71)}  is  a  maximal  chain  in  Pq. 

Proof.  By  Lemma  76  and  Corollary  77,  we  know  that  {(o',  7)  <  (07,  ^k)  <  ■  ■  ■  <  (07,71)}  is  a 
chain  extending  upward  from  (<7, 7)  in  Pr  if  and  only  if  {(07  \  <7,  7^)  <  ■  ■  ■  <  (07  \  o,  71)}  is  a 
chain  in  Pq- 

Maximality  follows  for  the  same  reason:  Refine  or  extend  a  chain  in  one  poset  and  one  can 
refine  or  extend  the  corresponding  chain  in  the  other  poset  as  well.  □ 

Comment  about  “length”:  Recall  that  the  length  of  a  chain  in  a  poset  is  one  less  than 
the  number  of  elements  in  the  chain.  We  also  speak  of  the  length  of  an  informative  attribute 
release  sequence  y±, ...  ,yk,  which  is  k,  the  actual  number  of  elements  in  the  sequence. 

In  the  context  of  Lemmas  20  and  21,  there  is  a  happy  alignment  of  definitions:  The  length 
k  of  a  longest  iars  in  R  is  the  length  £(Pf,). 

In  thinking  about  poset  lengths,  bear  in  mind  that  £(Pr)  may  be  any  of  l{Pft  ),  £(P^)  +  1, 
or  £(P+)  +  2,  depending  on  whether  the  top  and/or  bottom  elements  of  P/i  already  lie  in  Pr. 

Corollary  79  (Longest  Localization  Sequences).  Let  R  be  a  relation  on  X  x  Y,  with  both  X 
and  Y  nonempty  and  suppose  (<7,7)  £  Pr.  Let  Q  be  the  relation  modeling  Lk('pR,o'). 

If  X  (f  'Lr,  then  the  length  of  a  longest  informative  attribute  release  sequence  for  localizing 
a  in  R  is  £(Pq )  +  2.  If  X  £  ’Lr  and  0  /  X,  then  that  length  is  £(Pq)  +  1. 

(Note:  If  0  =  X  £  'Lr,  then  the  length  is  0;  one  can  identify  X  in  R  without  observation.) 

Comment:  If  Pq  does  not  contain  the  top  element  1q  of  Pq,  then  £(Pq)  +  2  =  £(Pq),  since 
Pq  never  contains  the  bottom  element  0q.  This  occurs  precisely  when  there  is  no  attribute 
that  is  shared  by  all  the  individuals  in  the  link. 
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Proof.  Let  us  address  one  special  case  first,  namely  when  Lk(\k^,  cr)  is  the  empty  complex.  In 
that  case,  Pq  is  empty,  so  £(Pq)  =  —  1.  Also  observe  that  in  this  case  any  y  G  7  identifies  a, 
as  otherwise  X  in  the  definition  of  Q  would  not  be  empty.  So  long  as  a  is  not  all  of  A,  we  do 
indeed  have  that  £(Pq)  +  2  =  1.  (Observe,  by  the  way,  that  it  is  impossible  for  the  following 
conditions  to  be  satisfied  simultaneously:  a  C  X  G  and  Lk(\k#,<7)  =  {0}.) 

Suppose  Lk(\k^j,  0)  is  not  the  empty  complex  and  that  X  0  Lemmas  20  and  21  imply 
that  a  longest  informative  attribute  release  sequence  for  localizing  a  comes  from  a  longest 
maximal  chain  in  Pft  at  and  above  (a,  7)  and  thus  may  be  obtained  by  Corollary  78  from  a 
maximal  chain  in  Pq.  The  length  of  the  chain  in  Pq  is  two  shorter  than  that  in  Pr.  (Why? 
Because  {a,  7)  G  becomes  0q  G  Pq,  which  is  not  present  in  Pq,  and  because  the  top 
element  1r  =  ( X ,  0)  G  P^  disappears  altogether.)  So  £(Pq)  +  2  gives  the  correct  length  of 
the  iars  in  R. 

Suppose  Lk(\k#,  <7)  is  not  the  empty  complex  but  that  a  C  X  G  ’k_R.  The  argument 
proceeds  as  before  except  that  now  the  top  element  of  P()  looks  like  1r  =  ( X ,  70),  with  70  0. 

It  appears  in  Pr.  Consequently,  1  q  =  ( X\a ,  70)  and  1  q  now  also  appears  in  Pq.  So  a  maximal 
chain  in  Pq  is  now  only  one  shorter  than  a  corresponding  maximal  chain  in  Pr  at  and  above 
(a,  7),  meaning  £(Pq)  +  1  gives  the  correct  length  of  a  longest  iars.  □ 

F.3  Isotropy 

We  turn  now  to  the  proof  of  our  isotropy  sphere  theorem,  with  the  theorem  replicated  here 
from  earlier  in  the  report.  Recall  also  Definitions  13,  14,  15,  16,  and  18  from  pages  41-43. 

Theorem  19  (Isotropy  =  Minimal  Identification  =  Sphere).  Let  R  be  a  relation  and  suppose 
0  /  7  G  Let  a  =  'i/jr( 7).  Then  the  following  four  conditions  are  equivalent: 

(a)  7  is  isotropic. 

(b)  7  is  minimally  identifying  (fora). 

(c)  ^Q(a, 7)  -  §fc_2,  with  k=  |7| . 

(d)  7)  =  3(7). 

Proof.  Observe  that  a  G  'Lr  and  7  C  (<f>R  o  ipR)( 7)  =  4>r(o),  so  constructing  Q(a,  7)  is  valid. 

Note  that  7  fL  For  if  there  were  some  x  G  X  such  that  (x,y)  G  Q(a,  7)  C  R  for 

every  y  G  7,  then  x  G  a,  but  a  is  disjoint  from  X. 

If  I7I  =  1,  then  Sfc-2  =  S^1  =  {0}  =  d(y).  Write  7  =  {y}.  Then  7  is  isotropic  if  and  only  if 
y  constitutes  an  informative  attribute  release  sequence  if  and  only  if  y  1 (>r(X).  If  y  G  4>r(X), 
then  a  =  X  so  our  conventions  say  I1  Q(ar/\  =  Qq^^)  =  0  /  {0}-  Moreover,  iPr($)  =  a,  so  7 
is  not  minimally  identifying.  If  y  0  1 (>r(X),  then  a  =  Xy  C  X  and  X  =  0,  and  both 
are  are  instances  of  {0},  by  our  conventions.  Moreover,  iPr($)  /  a.  So  we  see  that  (a), 

(b),  (c),  (d)  are  all  equivalent  when  |y|  =  1. 

Henceforth  assume  that  |y|  >  1.  It  will  be  convenient  to  write  7  =  {y  1, . . . ,  y^},  with  k  >  1, 
with  the  element  indexing  chosen  arbitrarily. 
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As  we  have  observed  elsewhere,  (c)  and  (d)  are  equivalent  by  Dowker  duality  and  the  fact 
that  only  a  boundary  complex  can  produce  S>k~'2  homotopy  type  when  the  underlying  vertex 
set  has  size  k. 

We  will  first  show  that  (a)  implies  (d)  and  (b): 

Suppose  that  7  is  isotropic. 

We  wish  to  show  that  all  proper  subsets  of  7  are  simplices  in  $q(o-,7)-  Without  loss  of 
generality,  consider  {y\, . . . ,  yk-i}-  If  we  can  show  that  ipR({yi,  ■  ■  ■ ,  yk-i})  \  <r  /  0,  then 
that  provides  am£l  such  that  ( x ,  yt)  G  R  for  i  =  1, . . .  ,k  —  1,  thereby  establishing  that 
{2/1 , . . . ,  Vk-i}  G  ^Q(ori)-  It  aIso  establishes  that  ijjR({y  1, . . . ,  2/fc-i})  2  ° •  Since  the  “missing 
element”  yk  is  arbitrary  in  7,  we  see  that  <h<3((Ti7)  =  ^(7)  and  that  7  is  minimally  identifying. 

Suppose  otherwise:  ipR({yi,  yk- 1})  =  <J  =  i'nil),  so  also  {<pR  o  Vt?)({2/i,  •  •  • ,  Vk- 1})  = 
(<f>R  0  Vh?.)(7)  2  7-  That  says  yk  G  (<j)R  o  ^J?)({yi,  •  •  •  ,2/fc-i}),  violating  the  assumption  that 
any  ordering  of  7  is  an  informative  attribute  release  sequence. 


We  will  now  show  that  (d)  implies  (a): 

Suppose  that  $q(<t,7)  =  ^(7). 

If  some  ordering  of  7  is  not  an  informative  attribute  release  sequence,  then  we  can  rearrange 
the  sequence  further  to  establish  that  the  last  element  is  implied  by  all  the  others,  i.e.,  that 
yk  G  (4>r  0  ipR)({yi,  ■  ■  ■  1  Uk- 1})-  Arguing  as  we  did  in  the  proof  of  Lemma  20  on  page  109,  we 
obtain: 


^r{{v  1,  •  •  -,Vk- 1}) 


(tpR  0  <I>r)  (i>R{{yi,  •  •  • ,  yk- 1})) 
V’r  0  V’i?)  ({2/1,  •  •  • ,  yk- 1})) 


^R  ({2/A:}  U  ((j)R  °1pR)({yi,.. 

•  ,2/fc-i})) 

Xyk  n  i,-' 

•  •  ,2/fe-i})) 

n  i’R{{yi,...,yk-i}) 

ipR{{yi,  ■  ■  ■ ,yk }) 

1M7) 

(7. 


On  the  other  hand,  since  {2/1 , . . . ,  yk-i}  G  ^Q(«t,7),  there  is  a  witness  x  £  X,  meaning 
x  G  tpR({yi,  •  •  • ,  2/fc— 1}),  which  contradicts  X  n  o  =  0. 


Finally,  we  will  show  that  (b)  implies  (d): 

Suppose  that  7  is  minimally  identifying. 

Observe  that  '^n({y\ , . . . ,  2/fc-i})  2  cr.  As  above,  this  establishes  {2/1 ,  •  •  •  ,2/fc-i}  G  <^Q(ct,7)> 
from  which  we  conclude  that  $q((Tj7)  =  9(7),  since  the  missing  element  yk  was  arbitrary.  □ 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


114 


Many  Long  Chains 


G  Many  Long  Chains 

This  appendix  provides  a  proof  of  Theorem  25  from  page  47. 

First,  we  need  some  tools: 

Recall  what  it  means  for  a  poset  to  be  almost  a  join-based  lattice  from  Definition  24  on  page  47. 

Definition  80  (Join  Completion).  Suppose  P  is  almost  a  join-based  lattice.  Let  S  be  a  subset 
of  P.  The  bounded  join-completion  of  S  in  P  is  the  set  5V  defined  by: 

5V  =  {p  E  P  |  p  <  s,  some  s  £  S,  and  p  =  s\  V  •  •  •  V  sm,  with  each  £  S ,  and  m  >  1 }. 

Here  and  in  the  rest  of  this  appendix,  “<”  and  “<”  refer  to  the  partial  order  on  P,  and 
“V”  denotes  the  join  operation  on  P  U  { 1} . 

We  also  define  5max  to  consist  of  all  the  maximal  elements  of  S  relative  to  the  partial  order 
inherited  from  P. 

The  following  facts  will  be  useful.  Assume  S  C  P,  with  P  almost  a  join-based  lattice.  Then: 

1.  S'v  is  almost  a  join-based  lattice.  The  join  operation  for  elements  p,  q  £  Sv  is  given  by: 

{  p  V  q,  if  p  V  q  <  s,  for  some  s  £  5; 

A 

1,  otherwise. 

2.  5C5V  and  Smax  =  (Sv)max. 

3.  (Sv)v  =  Sv. 

4.  If  T  C  S,  then  Tv  C  Sv . 

5.  If  T  C  Sv  such  that  S'max  \  T  /  0,  then  Tv  C  5V. 

6.  Let  0  C  T  C  S.  Then  the  poset 

St  =  {p  £  Sv  |  p  <  t,  for  all  t  £  T  } 

is  almost  a  join-based  lattice.  The  join  operation  for  elements  p,q  £  St  is  given  by: 

(  p  V  q,  if  pW  q  <  t,  for  all  t  £  T; 

P^St  Q  =  {  A 

(  1,  otherwise. 

7.  Fact  6  holds  as  well  for  the  poset  S'T  =  {p  £  Sv  \  p  <  t,  for  all  t  £  T }, 
now  using  “<”  in  place  of  “<”  throughout. 
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Lemma  81  (Contractibility  of  Closed  Semi-Intervals).  Suppose  S  C  P,  with  P  almost  a  join- 
based,  lattice.  Let  0  C  T  C  S  and  define  the  poset  St  as  in  Fact  6  on  page  Ilf. 

If  St  0,  then  St  is  contractible. 

Proof.  Suppose  p  and  q  are  arbitrary  elements  of  St-  Every  element  of  T  is  an  upper  bound 
for  both  p  and  q.  Since  T  is  not  empty,  this  means  p  V  q  exists  in  P  and  pV  q  <  t  for  all  t  £  T. 
Since  t  £  S',  we  have  that  p  V  q  £  Sv  and  thus  p  V  q  £  St  as  well.  Consequently,  the  lattice 
St  U  {0, 1}  is  noncomplemented,  implying  that  St  is  contractible,  by  a  fact  on  page  84.  □ 

Intuitively:  A  (St)  is  a  cone  with  apex  /\  T,  the  meet  of  all  the  upper  bounds. 

Caution:  The  lemma  need  not  hold  for  S'T  as  defined  in  Fact  7. 

We  now  specialize  a  topological  tool  to  our  current  setting.  We  refer  to  the  lemma  as  “cycle 
tightening”  because  we  will  apply  the  lemma  with  p  £  Smax  and  with  z  a  reduced  homology 
generator  of  A (P).  The  lemma  will  allow  us  to  move  that  generator  downward  in  P. 

Lemma  82  (Cycle  Tightning).  Let  P  be  almost  a  join-based  lattice.  Suppose  z  =  YliniTi  a 

nontrivial  reduced  k-cycle  for  A (P),  i.e.,  0  /  z  £  Ck(A(P);Z)  and  dz  =  0,  for  some  k  >  0. 
Define  S  =  ||z||  and  K  =  {r  £  A (P)  \  r  C  Sv}. 

Let  p  £  S. 

If  -f/fc_i(Lk(if,p);  Z)  =  0,  then  there  exists  p  £  Ck+i{St(I\, p)\ Z)  such  that  p  ^  ||z  +  (9?7||, 
now  viewing  p  £  Cfc+i(A(P);Z). 

Proof.  Let  W  =  St (K,p)  and  A  =  Lk (K,p).  Note  that  A  is  not  the  empty  complex  (that 
observation  follows  from  the  reduced  homology  assumption  when  k  =  0  and  the  fact  that  p  is 
part  of  a  simplex  containing  at  least  one  other  element  when  k  >  0). 

The  long  exact  sequence  for  a  pair  [12]  therefore  gives  us  the  following  exact  sequence: 

0  =  Hk(W;  Z)  —  Hk(W,A-  Z)  —  Ffc_i(A;Z)  =  0 

The  left  0  comes  from  W  being  a  cone  and  the  right  0  comes  from  the  lemma’s  hypotheses. 
Consequently,  Hk(W,  A;  Z)  =  0.  Now  let  zs  consist  of  the  part  of  z  that  lies  within  W,  so: 

ZS  =  ^2  ntTt. 
ri&W 

Since  z  is  a  reduced  At-cycle  with  support  in  verts (17),  zs  is  a  reduced  relative  k- cycle  for 
the  pair  (IT,  A). 

Since  Hk(W,A ;  Z)  =  0,  zs  must  be  a  reduced  relative  boundary,  so  there  exists  k  £ 
Ck+\(W]  Z)  such  that  zs  =  On  +  7,  with  7  £  Ck(A-,Z). 

Now  let  rj  =  —  k  and  view  7  £  Ck+i(A(P );  Z). 

Observe  that  ||z5  +  drj\\  C  verts(yl)  C  verts(dl (K,p)).  Consequently,  p  0  \\z  +  drj\\.  □ 

Lemma  83  (Maximal  Element  Cardinality).  Let  P  be  almost  a  join-based  lattice.  Suppose 
P  has  reduced  integral  homology  in  dimension  k  >  0,  that  is,  Hk(A(P);Z)  0.  Consider  a 
reduced  homology  generator  z  =  for  some  collection  {t;}  such  that  n%  0  for  each  Ti. 

Let  S  =  ||z|| .  Then  \Smax\  >  k  +  2. 
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Proof.  Since  S  C  Sv,  z  £  Cfc(A(5v);  Z).  If  there  exists  C'/c+i(A(5v);  Z)  such  that  dij  =  z, 
then  2  would  also  be  a  reduced  boundary  in  A (P).  So,  i//t(A(5v);  Z)  /  0  and  z  is  a  reduced 
homology  generator  for  A(5V). 

Recall  the  notation  St  in  Fact  6  on  page  114.  We  claim  that 

U  A  (S{t})  =  A(SV). 

t&S  max 

To  see  this,  first  observe  that  the  empty  simplex  0  appears  in  both  these  sets.  Then: 

I.  Suppose  0  /  a  £  A(S'm)  for  some  t  £  Smax.  Being  a  chain  in  Stty,  we  can  write  o  as 

{'Po  <  Pi  <  ■  ■  ■  <  Pi},  for  some  i  >  0,  with  each  jg  £  Sy  and  with  pp  <  f  E  S'max  Q  S. 

Consequently,  a  £  A(SV)  as  well. 

II.  Suppose  0  /  a  £  A(5V).  Then  a  =  {po  <  p\  <  ■  ■  ■  <  pp},  for  some  l  >  0,  with  each 

Pi  £  Sv.  By  definition  of  5V  and  5max,  pp  <  s  <  t,  for  some  s  £  S  and  t  £  5max. 

Consequently,  a  £  A(S'{t|)  as  well,  for  that  t. 

Similarly,  one  sees  that,  for  any  0  /  T  C  S, 

f]A(%)  =  A  {ST). 

teT 

The  complex  on  the  right  is  either  empty  or  contractible,  by  Lemma  81,  so  we  see  that  the 
intersection  on  the  left  is  either  empty  or  contractible. 

A  variation  of  the  Nerve  Lemma  now  implies  that  A(S'V)  and  the  nerve  of  the  simplicial 
complexes  {A(S,{t})}tg^m  have  the  same  homotopy  type  (see  Theorem  10.6(i)  in  [1]). 

Since  A (Sv)  has  reduced  homology  in  dimension  k,  so  does  the  nerve  of  {A(S{t})}tggm  . 
The  nerve  of  {A(5|f|)}igSma  is  isomorphic  to  a  simplicial  complex  with  underlying  vertex 
set  S'max.  In  order  for  a  simplicial  complex  to  have  reduced  homology  in  dimension  k  it  must 
have  at  least  k  +  2  vertices.  Thus  |Smax|  >  k  +  2.  □ 

We  now  turn  to  the  proof  of  the  main  theorem,  the  statement  of  which  is  replicated  here: 

Theorem  25  (Many  Chains).  Let  P  be  almost  a  join-based,  lattice.  Suppose  P  has  reduced 
integral  homology  in  dimension  k  >  0,  that  is,  Lffc(A(P);Z)  /  0. 

Then  there  are  at  least  (k  +  2)!  maximal  chains  in  P  of  length  at  least  k. 

Proof.  The  proof  is  by  induction  on  k. 

I.  For  the  base  case,  k  =  0,  observe  that  A (P)  must  have  at  least  two  vertices  that  are 
incomparable  in  P,  as  otherwise  A (P)  would  be  either  empty  or  contractible.  Each  vertex  sits 
inside  a  maximal  chain  of  P.  The  chains  are  distinct  since  the  vertices  are  incomparable. 

II.  For  the  induction  step,  assume  that,  for  some  k  >  1,  the  theorem  holds  for  all  relevant 
P  with  reduced  homology  in  dimension  k—  1.  We  need  to  establish  the  theorem  for  all  relevant 
P  with  reduced  homology  in  dimension  k. 
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Let  z  =  YhiniTi  be  a  homology  generator  of  L/fc(A(P);  Z),  with  n*  /  0  for  all  rl. 

Define  S  and  K  by  S  =  ||z||  and  K  =  {r  £  A (P)  |  r  C  Sv}.  Interpretation:  S  is  the 
support  of  the  homology  generator  z  and  K  is  the  subcomplex  of  A (P)  formed  by  restricting 
to  the  bounded  join-completion  of  z’s  support. 


We  now  have  an  inner  induction,  which  we  will  describe  as  an  iterative  algorithm: 
(Notation:  superscript  (j)  indicates  the  iteration.) 


1.  Initialize  with  z^  =  z,  S =  S,  and  K1'0'1  =  K. 

2.  Suppose  z^\  S®,  and  have  been  defined,  with  z^  a  homology  generator  of 

i7fc(A(P);Z),  and  with  and  similar  in  meaning  to  S  and  K,  now  based  on  z^\ 
In  particular,  z^  has  support  and  all  of  K^’s  vertices  lie  in  . 

Pick  some  p  E  (5^^)max  such  that  ,p);  Z)  =  0. 

If  no  such  p  exists,  then  the  loop  ends. 

3.  Otherwise,  invoke  Lemma  82  to  find  anjj  G  Ck+i(St(K^\ p);  Z)  such  that  p  </  \\zS3^ +dr)\\. 
Let  z^+1^  =  z^  +  dp,  S'L+11  =  ||z^+1)||,  and 


Kti+V  =  jreA(P) 


r  c  (S^'+1))v}. 


Observe  that  C  \\z^  ||  U  ||<9?7||  C  (S^)v . 

On  the  other  hand,  p  £  (S^) max  \  5^+1^.  So  by  Fact  5  on  page  114,  (,S,L+1))V  C  (S^)v. 
In  other  words,  the  possible  vertex  set  for  the  sinrplicial  complex  shrinks  with  each  iteration 
and  so  the  loop  must  eventually  end,  P  being  finite. 


Given  this  iterative  algorithm,  we  can  now  assume  without  loss  of  generality  that 
Hk-i{Lk(K,p);  Z)  ^  0  for  each  p  that  is  a  maximal  element  in  the  support  S  of  the  given 
homology  generator  z. 

Observe  that  Lk(A', p)  =  {t  £  A(P)  |  r  C  Sv  and  s  <  p  for  every  s  £  r},  when  p  £  Sj^ax. 

Consequently,  Lk (K,p)  =  A (Qp),  where  Qp  is  the  subposet  of  P  given  by 

Qp  =  {s  £  5V  |  s  <  p}. 

By  Fact  7  on  page  114,  Qp  is  itself  almost  a  join-based  lattice. 

Qp  has  reduced  integral  homology  in  dimension  k—  1,  so  by  the  induction  hypothesis,  there 
are  at  least  {k  +  1)!  maximal  chains  in  Qp  of  length  at  least  k  —  1.  As  the  description  of  Qp 
makes  clear,  we  can  extend  each  of  these  chains  in  P  by  adding  p  as  a  top  element,  then  further 
refine  and/or  extend  each  chain  as  needed  into  a  maximal  chain  in  P.  Distinct  chains  remain 
distinct  after  this  augmentation  since  the  process  only  adds  elements  of  P  that  lie  outside  Qp. 

Consequently,  we  obtain  for  each  p  £  5max  at  least  (k  +  1)!  distinct  maximal  chains  in  P 
of  length  at  least  k ,  each  touching  p.  A  maximal  chain  in  P  cannot  contain  more  than  one 
element  of  5max  since  such  elements  are  necessarily  incomparable.  Letting  p  vary  over  Smax 
therefore  produces  at  least  ISjnaxI  ■  (k+  1)!  distinct  maximal  chains  in  P  of  length  at  least  k. 

By  Lemma  83,  (Smaxl  >  k  +  2.  So  P  contains  at  least  ( k  +  2)!  distinct  maximal  chains  of 
length  at  least  k.  □ 
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Corollary  26  (Holes  Reduce  Inference).  Let  R  be  a  relation.  Suppose  Pr  has  reduced  integral 
homology  in  dimension  k  >  0.  Then  there  are  at  least  (A; +  2)!  maximal  chains  in  Pr  of  length 
at  least  k. 

Proof.  The  assertion  follows  from  Theorem  25,  since  Pr  is  almost  a  join-based  lattice. 

(The  join  operation  is  exactly  that  of  P^.  In  particular,  the  top  element  1  r  of  P^  is  not 
already  in  Pr,  since  Pr  has  homology,  so  we  may  adjoin  that  as  the  upper  bound  1  for  Pr.)  □ 


Recall  informative  attribute  release  sequences  from  Appendix  F. 

Corollary  27  (Holes  Defer  Recognition).  Let  R  be  a  relation  and  let  (<7,7)  €  Pr. 

Define  Q  =  Q(a,  7)  as  per  Definition  18  and  recall  Definition  16,  from  pages  f2-f3. 
Suppose  Pq  has  reduced  integral  homology  in  dimension  k  >  0. 

Then  there  are  at  least  (k  +  2)!  distinct  informative  attribute  release  sequences  y\, . . .  ,ye 
for  R,  each  with  £  >  k  +  2,  such  that  'fn({y\ , . . . ,  ye })  =  a.  Consequently,  rsiow(cr)  >  k  +  2. 

Proof.  By  Corollary  26,  Pq  contains  at  least  ( k  +  2)!  maximal  chains  of  length  at  least  k. 

The  rest  of  the  argument  is  much  like  that  in  the  proof  of  Corollary  79  on  page  111: 

•  Each  maximal  chain  in  Pq  gives  rise  to  a  maximal  chain  in  P£  at  or  above  (<7,7). 

•  Distinctness  in  Pq  carries  over  to  pr- 

•  In  moving  from  Pq  to  Pft  one  adds  two  elements: 

1.  One  adds  ((7,7),  corresponding  to  0q  in  Pfi. 

2.  Pq  has  homology,  so  no  attribute  is  shared  by  all  individuals,  either  in  Q  or  R. 
One  thus  also  adds  the  top  element  1  r  of  Pft,  corresponding  to  1  q  in  Pq. 

Summary:  Each  distinct  maximal  chain  of  Pq  gives  rise  to  a  distinct  maximal  chain  at  or 
above  (o,  7)  in  P^  of  length  at  least  k  +  2,  and  therefore  a  distinct  informative  attribute  release 
sequence  of  length  at  least  k  +  2.  So  by  Definition  16,  rsiow(cr)  >  k  +  2. 

(How  do  we  know  that  distinct  maximal  chains  produce  distinct  iars?  Because  if  two  iars 
are  the  same,  the  chains  must  be  the  same,  by  the  “Moreover”  of  Lemma  20  on  page  109. 
It  is  true  that  one  may  be  able  to  obtain  different  iars  from  the  same  maximal  chain,  but 
our  counting  was  over  maximal  chains,  so  provides  a  lower  bound  for  the  number  of  distinct 
iars.)  □ 

Comment:  Since  Pq  has  reduced  homology  in  nonnegative  dimension,  o  X.  Along  with 
the  assumption  ((7,7)  £  Pr,  that  means  relation  Q(a,y)  models  the  link  Lk('L/j,  0). 
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Recall  the  discussion  and  terminology  of  Section  13. 

The  primary  goal  of  this  appendix  is  to  provide  a  proof  of  Theorem  31.  In  addition,  this 
appendix  provides  proof  of  some  of  the  assertions  in  the  bullets  on  pages  68-69. 

Once  again,  we  first  need  to  develop  some  tools: 

H.l  Source  Complex 

Subsection  13.1  introduced  the  strategy  complex  A q  of  a  graph  G  =  (V,  21).  Recall  that  every 
action  a  £  21  has  a  unique  source  state  in  V.  Given  a  set  of  actions  A  C  21,  we  say  src(*4.)  is 
the  start  region  of  A,  defined  by 

src(*4)  =  {u  £  V  |  v  is  the  source  of  some  a  £  *4}. 


One  obtains  another  simplicial  complex  from  G  via  src,  now  on  underlying  vertex  set  V: 

Aq  =  {src(cr)  |  a  £  Ag}. 

We  refer  to  this  complex  as  G’s  source  complex. 


1 


2 


3 


Figure  55:  Source  complex  for  the  graph  of  Figure  44  on  page  65. 


The  map  src  :  ^(Ag)  — >  ^(AgO  is  a  homotopy  equivalence,  so  Aq  —  Aq  [6,  7]. 
Consequently,  for  a  fully  controllable  graph,  A q  =  d(V),  the  boundary  complex  of  the  full 
simplex  on  vertex  set  V.  For  the  graph  of  Figure  44  on  page  65,  the  source  complex  is  the 
boundary  of  a  triangle,  as  shown  in  Figure  55. 
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Figure  56:  Relation  B  describes  the  source  complex  Aq  of  the  graph  of  Figure  44.  Each  row 
describes  the  start  region  of  a  maximal  simplex  of  Aq,  which  appeared  in  Figure  45  on  page  66. 
The  rightmost  column  again  shows  each  maximal  strategy’s  goal. 
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In  Lemma  29  we  saw  that  Aq  =  4*  a  for  the  action  relation  A  defined  there.  We  can  see 
that  A q  =  4>b,  for  yet  another  relation,  which  we  will  refer  to  as  B.  Figure  56  shows  that 
relation  for  the  graph  of  Figure  44.  More  generally,  we  have  the  following  lemma: 

Lemma  84.  Let  G  =  (V,  21)  be  a  graph  as  discussed  in  Section  13  and  let  Wl  the  set  of  maximal 
simplices  of  Aq.  Define  relation  B  on  Wl  x  V  by  B  =  {(cr,  v)  \  v  G  src(cr)  and  o  G  SOI}. 

Then  4>b  =  Aq. 

(Again,  the  proof  is  nearly  definitional,  so  we  omit  it.) 

(The  “1?”  stands  for  “Beginning”  —  while  “S'”  for  “source”  might  be  desirable,  we  have 
already  used  S  to  mean  “support”  elsewhere.) 

How  should  we  interpret  the  remaining  Dowker  complexes,  'll  a  and  'Lb,  for  relations  A  and 
B ?  To  answer  this,  let’s  look  at  the  semantics  of  simplices  in  these  complexes.  A  simplex  in 
^A  represents  a  collection  of  maximal  simplices  of  Aq,  namely  maximal  simplices  that  have 
at  least  one  action  in  common.  A  simplex  in  \Lb  again  represents  a  collection  of  maximal 
simplices  of  A^,  now  with  at  least  one  source  state  in  common.  Thus  \L a  C  \Lb-  Moreover, 
from  Dowker  duality  one  obtains: 

Lemma  85.  Let  G  =  (V,  21)  be  a  graph  as  discussed  in  Section  13,  with  7^0. 

Then  the  inclusion  t  :  5('La)  — ►  S^b)  is  a  homotopy  equivalence. 

Comment:  The  assumption  V  ^  0  means  Aq  and  Aq  are  not  void,  so  relation  B  is  not 
void.  If  V  7^  0  but  21  =  0,  then  technically  relation  A  is  void,  but  is  is  convenient  to  think  of 
it  as  an  instance  of  the  empty  relation  instead,  with  associated  empty  Dowker  complexes. 


Proof.  Consider  the  following  diagram: 


30L0 

/K 

if  A 

S(<Li) 

3Ag) 


— ^  SO M 

IpB 

SOLO 

-^5(AG). 


Recall  that  if  a,  'f’B,  and  src  are  homotopy  equivalences. 

Let  971  denote  the  maximal  simplices  of  Aq.  Observe  the  following,  for  each  o  G  $(Aq): 

(l  o  if  a)  {a)  =  {o'  G  971  |  a  C  o' }. 

(ifs  °  src)(cr)  =  {o'  G  911  |  src(cj)  C  src(cr/)  }. 

If  o  C  o',  then  src(cr)  C  src(cr/). 

Consequently,  (t  o  ifjf){o)  <  (ifs  °  src)(<r)  for  every  o  G  $(Aq),  where  “<”  refers  to  the 
partial  order  on  5('I/b)- 

We  conclude  that  the  two  order-reversing  poset  maps  i  o  if  a  and  ifs0  src  are  homotopic 
(see  [1],  Theorem  10.11)  and  therefore  that  l  is  a  homotopy  equivalence.  □ 
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Lemma  86.  Let  G  =  (V,  21)  be  a  graph  as  discussed  in  Section  13,  with  V  ^  0.  Then  src 
induces  a  homotopy  equivalence  of  posets  Pa  — >  Pb  with  explicit  formula 

( t,o )  i->  (fgl^B  °  src)(cr ),  (4>b  °  VlB  °  src)(cr )) . 

Proof.  Let  clA  denote  the  image  of  the  closure  operator  4>a  °  '■  »  SX^a)  and  let 

cl b  denote  the  image  of  the  closure  operator  <pB  o  ifs  :  5(<hg)  — >  J(<1>£).  We  then  have  the 
following  diagram  of  homotopy  equivalences: 

Pa  — ^  cU  J(<hA)  =  £(AG)  5(Ag)  =  Z($B)  clB  ^  PB- 

(Here  7r2  is  projection  onto  the  second  coordinate,  i.e.,  7t2(t,  a)  =  a  and  each  of  the 
occurrences  of  i  is  an  inclusion.) 

The  composition  of  all  these  maps  is  an  order-preserving  poset  map  with  the  specified 
formula.  The  overall  map  is  a  homotopy  equivalence  because  each  of  its  constituent  maps  is  a 
homotopy  equivalence.  □ 

Corollary  87.  If  G  is  fully  controllable  in  Lemma  86,  then  the  formula  for  the  poset  map 
becomes  ( t,o )  [(^B  o  src)(<r), src(cr)) . 

Proof.  Since  G  is  fully  controllable,  T#  =  AG  =  d(V )  ~  §n-2,  with  n  =  \V\.  So  &b  has  no 
free  faces,  implying  that  <f>B  °  VfB  is  the  identity,  by  Lemma  61  on  page  94.  □ 

Two  Observations:  Assume  that  G  is  a  fully  controllable  graph,  (i)  No  action  can  appear 
in  all  maximal  simplices  of  AG  as  that  would  mean  AG  would  be  a  cone,  so  not  homotopic  to 
a  sphere.  Consequently,  1A  =  (911, 7)  has  7  =  0  (recall  that  9 71  is  the  collection  of  all  maximal 
simplices  of  AG).  (ii)  Even  if  all  actions  of  21  appear  individually  as  vertices  of  AG,  0A  =  (r,  21) 
has  t  =  0,  since  src(2l)  =  V  and  V  ^  d(V). 

These  observations  mean  that  Pa  does  not  contain  either  the  top  element  1A  or  the  bottom 
element  0A  of  PA ,  when  G  is  fully  controllable. 

H.2  Delaying  Strategy  and  Goal  Recognition 

We  now  turn  to  the  proof  of  the  main  theorem,  the  statement  of  which  is  replicated  here: 

Theorem  31  (Delaying  Strategy  Identification).  Let  G  =  (V,  21)  be  a  fully  controllable  graph, 
with  n  =  |V|  >  1.  Let  A  be  the  relation  constructed  as  in  Lemma  29  on  page  61  and  let  Pa  be 
its  associated  doubly-labeled  poset.  Then: 

For  each  v  G  V,  there  exists  a  maximal  strategy  av  G  AG  for  attaining  singleton  goal  state 
v  such  that  Pa  contains  at  least  (n  —  1)!  distinct  maximal  chains  for  identifying  av,  with  each 
chain  consisting  of  at  least  n  —  1  elements. 

Proof.  Let  PAp  be  Pa  but  with  the  opposite  order.  Then  P^>  is  almost  a  join-based  lattice, 
with  join  operation  for  elements  of  PAp  given  by 

f  (ti  n  72,  {(j>A  0  V’A)(or  1  u  cr2)) ,  when  r1nr2  /  0; 

(n,0i)  V  (r2,cr2)  =  <  1 

(  1,  otherwise. 
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The  maximal  elements  of  are  of  the  form  ({cr},cr),  with  cr  varying  over  the  maximal 
simplices  of  Aq-  Each  minimal  element  of  is  of  the  form  (^({a}),  (&A  °  Vf4)({a}))  >  with 
action  a  some  vertex  of  A^.  (Aside:  not  every  element  of  that  form  is  necessarily  minimal.) 

Since  G  is  fully  controllable,  A(P)(p)  ~  Sn_2,  which  has  reduced  homology  in  dimension 
k  =  n  —  2.  By  the  proof  of  Theorem  25,  on  page  117,  there  exists  a  homology  generator  z  for 
A (P^p)  with  support  S  =  ||z||  such  that  P^p  contains,  for  each  p  £  Smax,  a  collection  of  maximal 
chains  passing  through  p  with  the  following  property:  Even  if  one  merely  considers  the  portions 
of  the  chains  at  and  below  p.  the  collection  contains  at  least  {n  —  1)!  distinct  such  subchains 
and  each  subchain  has  length  at  least  n  —  2.  Each  full  chain,  being  maximal,  must  be  a  path  in 
P<A  between  some  top  element  ({cr},  cr)  and  some  bottom  element  (^({a}),  (c/)a  °  Vhf)({a}))- 
Working  upward  from  the  bottom  in  (P|*)op  (which  is  equivalent  to  working  downward  from 
the  top  in  P )j~),  each  such  chain  therefore  gives  rise  to  an  informative  action  release  sequence 
for  identifying  o,  consisting  of  at  least  n  —  1  actions.  Moreover,  there  are  at  least  (n  —  1)! 
different  such  sequences  for  that  same  strategy  a;  we  can  hold  fixed  the  portion  of  any  chain 
at  and  above  p  in  P )(p,  while  varying  the  portion  below  p  in  at  least  (n  —  1)!  different  ways. 

Let  p  £  S'max  and  suppose  c  is  some  maximal  chain  of  P ^  that  passes  through  p  and 
touches  top  element  ({<t},<t).  Pick  q  £  S,  with  q  <  p  (here  “<”  is  the  order  on  P )(p).  Write 
p  =  (rp,  (Tp )  and  q  =  (rf/,  aq).  Even  though  q  may  not  be  part  of  chain  c,  we  can  still  conclude 
that  <jq  C  Op  C  cr.  If  additionally  src (aq)  =  V  \  {u},  then  a  at  the  top  must  be  a  maximal 
strategy  for  attaining  singleton  goal  state  v.  In  order  to  prove  the  theorem,  it  is  therefore 
enough  to  show  that  for  any  v  £  V  some  such  q  exists. 

Recall  the  source  relation  B  from  Lemma  84.  Let  PgP  be  Pb  but  with  the  opposite  order. 
Referring  back  to  the  notation  in  the  proof  of  Lemma  86,  and  using  the  fact  that  G  is  fully 
controllable,  one  sees  that  A(P^p)  =  A  (cl#)  =  A(5'(4>^))  =  sd(<9(E)),  with  “=”  meaning 
“isomorphic”  and  “sd”  meaning  “first  barycentric  subdivision”.  The  isomorphism  holds  by 
definition  of  Pb-  The  first  equality  holds  because  4>b  °  ipB  is  the  identity  when  G  is  fully 
controllable,  as  we  saw  in  the  proof  of  Corollary  87.  The  second  equality  amounts  to  the 
definition  of  first  barycentric  subdivision,  bearing  in  mind  that  <I>b  =  A q  =  d(V). 

The  homotopy  equivalence  of  Lemma  86  carries  over  to  this  setting  as  6  :  A(P((p)  — > 
sd(<?(V))  and  Corollary  87  provides  an  explicit  formula.  Specifically,  for  vertices  (r,  o)  of 
A  (-P)(p),  one  has  0(t,o)  =  src  (cr). 

Since  9  is  a  homotopy  equivalence,  the  induced  map  0*  on  reduced  homology  must  map  the 
homology  generator  z  to  a  homology  generator  for  the  triangulated  (n  —  2)-sphere  sd(<9(C)). 
Consequently,  ||0*(z)||  must  consist  of  all  nonempty  proper  subsets  of  V.  In  particular,  for 
each  v  £  V,  there  is  some  q  =  (■ Tq,aq )  £  ||z||  such  that  src(cr9)  =  9(q)  =  C\{u},  as  desired.  □ 

H.3  Hamiltonian  Flexibility 

The  next  lemma  establishes  the  second  bullet  in  the  comments  on  page  68. 

Definition  88  (Complete  Strategy).  Let  G  =  (V,  21)  be  a  graph  as  discussed  in  Section  13. 
A  complete  strategy  for  attaining  state  v  is  a  strategy  o  that  has  at  least  one  action  at  every 
state  other  than  v.  So  o  £  Aq  and  src(cr)  =  V  \  {u}. 
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Lemma  89  (Delaying  Goal  Identification).  Let  G  =  (V,  21)  be  a  fully  controllable  graph.  Let 
n  =  \V\.  Suppose  s  6  V  is  some  desired  goal  state. 

There  exists  a  sequence  of  actions  a\,  02, . . . ,  an_i  in  21  satisfying  the  following  conditions: 

(i)  {ai, . . . ,  an- 1}  is  a  complete  strategy  for  attaining  s. 

(ii)  For  each  i  =  1, . . . ,  n  —  1,  let  a*  =  {ai, . . . ,  a*}  and  Wi  =  src (ai).  Then 
for  each  v  G  V  \  Wi,  there  exists  a  complete  strategy  a  for  attaining  v, 
such  that  Oi  C  a  G  Aq. 

Comments: 

(a)  Condition  (i)  implies  that  no  two  of  the  actions  ai, ... ,  an_i  have  the  same  source  state. 

(b)  Condition  (ii)  further  implies  that  the  sequence  ai,...,an_i  forms  an  informative 
attribute  release  sequence  for  the  relation  A  defined  in  Lemma  29  on  page  67.  The  reason 
is  that  any  state  in  v  G  V  \  Wi  could  still  be  a  goal  state  after  an  observer  has  seen  the  actions 
a\, . . .  ,ai,  so  the  observer  cannot  predict  even  the  source  of  the  next  action  to  be  released. 

Proof.  For  the  proof,  we  assume  that  21  contains  only  deterministic  and  nondeterministic 
actions,  not  stochastic  ones.  The  proof  generalizes  to  graphs  that  include  stochastic  actions 
(in  addition  to  deterministic  and  nondeterministic)  by  an  argument  in  [7] .  The  essence  of  that 
argument  is  that  the  source  complex  of  a  graph  does  not  change  if  one  replaces  stochastic 
transitions  by  deterministic  ones. 

We  sketch  the  rest  of  the  proof,  assuming  all  actions  are  deterministic  or  nondeterministic. 

Since  G  is  fully  controllable,  for  each  state  in  V  there  must  be  a  deterministic  transition 
to  that  state  (from  some  other  state).  Backchaining  such  transitions  gives  rise  to  a  cycle  of 
deterministic  actions,  since  the  graph  is  finite.  If  that  cycle  is  Hamiltonian,  then  we  may  chose 
a\, ... ,  an- 1  to  be  any  ordering  of  those  n  deterministic  actions  except  that  we  omit  the  action 
whose  source  is  s. 

Suppose  instead  that  the  cycle  of  deterministic  actions  covers  only  a  proper  subset  W  of  the 
state  space  V.  Form  a  quotient  graph  with  state  space  V'  =  {o}  U  V  \  W,  where  o  represents 
all  of  W  collapsed  to  a  point.  Inductively,  the  lemma’s  assertions  hold  for  the  quotient  graph. 
One  then  needs  to  show  how  to  combine  the  actions  determined  by  the  quotient  graph  with  the 
cycle  on  W  in  order  to  satisfy  the  lemma’s  assertions  for  the  original  graph  G.  That  argument 
is  straightforward  if  a  bit  tedious,  so  we  omit  it.  □ 

The  next  lemma  establishes  the  Hamiltonian  “yes”  in  the  first  bullet  on  page  68. 

Definition  90  (Hamiltonian  Action  Cycle).  Let  G  =  (V,  21)  be  a  graph  whose  actions 
may  have  uncertain  outcomes.  A  sequence  of  actions  a\,...,an,  with  n  =  \V\,  is  a 
Hamiltonian  cycle  of  actions  whenever: 

(i)  No  two  actions  have  the  same  source  state. 

(ii)  Each  action  is  either  deterministic  or  stochastic  (so,  nondeterministic  is 
disallowed). 

(in)  The  source  of  action  a*+i  is  a  target  of  action  ai,  for  all  i  =  1, . . . ,  n  —  1,  and 
src(oi)  is  a  target  of  an. 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


124 


Hamiltonian  Flexibility 


Observe:  Any  proper  subset  of  a  Hamiltonian  cycle  of  actions  is  a  simplex  in  Aq. 

(That  observation  requires  understanding  the  definition  of  Ag  when  stochastic  actions  are 
involved:  stochastic  cycles  are  fine,  so  long  as  they  are  not  recurrent.  See  [7]  for  details.) 

Lemma  91  (Delaying  Identification  of  a  Given  Strategy).  Let  G  =  (V.  21)  be  a  fully  controllable 
graph.  Assume  21  contains  a  Hamiltonian  cycle  of  actions  ai, . . . ,  an,  with  n  =  \V\. 

Suppose  av  is  a  maximal  and  complete  strategy  in  Ac  for  attaining  v  £  V.  Then  av 
contains  actions  bi, . . . ,  6n_i  that  constitute  a  complete  strategy  for  attaining  v  and  that  form 
an  informative  attribute  release  sequence  for  relation  A. 

(Recall:  Relation  A  was  defined  in  Lemma  29  on  page  67;  it  models  the  maximal  simplices 
of  Aq  in  terms  of  their  constituent  actions.) 

Proof.  Let  crv  be  as  specified. 

We  can  assume  without  loss  of  generality  that  V  =  {1 that  src(aj)  =  i  for  all  i  £  V, 
and  that  v  =  n  >  1. 

Now  let  bn- 1  be  any  actions  in  av  chosen  so  that  src (bf)  =  i,  for  i  =  1, . . . ,  n  —  1. 

(If  bi  =  ai  for  some  i,  that  is  fine.) 

Then  {b\, . . . ,  bn-{\  is  itself  a  complete  strategy  for  attaining  v. 

We  claim  that  the  release  order  bn-\,...,b\  constitutes  an  informative  attribute  release 
sequence  for  relation  A.  In  fact,  we  will  prove  the  stronger  assertion: 

Claim:  Pick  some  i  £  {1, . . . ,  n}.  Then: 

For  each  s  £  {n}U{l,  —  1},  there  exists  acomplete  strategy 

as  £  A  a  for  attaining  s,  with  {bi,  bi+ 1, . . . ,  bn- 1}  C  crs. 

(Notation:  {bi,  6,:+i, . . . ,  bn- 1}  =  0  when  i  =  n.) 

Consequently,  after  an  observer  has  seen  bn- 1, . . . ,  bi,  the  observer  cannot  predict  even  the 
source  state  of  the  next  action  to  be  released,  and  so  the  action  sequence  is  informative  for  A. 

The  claim  certainly  holds  for  s  =  n,  using  the  original  av.  Now  consider  ans  £  {1, . . .  ,i  —  1} 
and  let  as  =  {ai, . . . ,  as_i}  U  {6s+i, . . . ,  6n-i}  U  {an}.  By  arguments  from  [7],  as  £  Aq.  □ 

Caution:  As  mentioned  on  page  68,  just  because  6n_i, . . . ,  b\  as  produced  by  Lemma  91  is 
an  informative  attribute  release  sequence  for  A,  that  does  not  mean  one  should  always  release 
actions  in  that  fashion.  If  the  release  protocol  were  so  rigid,  an  adversary  familiar  with  the 
protocol  would  be  able  to  infer  much  about  the  goal.  In  particular,  the  target  set  of  6n_i 
includes  the  goal  state,  so  if  that  action  is  deterministic,  then  the  adversary  would  be  able  to 
infer  the  goal  from  the  first  action  released. 
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I  Morphisms  and  Lattice  Generators 

The  aim  of  this  appendix  is  to  prove  the  claims  of  Section  14,  ending  with  Theorem  40.  That 
theorem  shows  how  a  surjective  morphism  of  relations  can  use  lattice  operations  to  fully  cover 
the  image  lattice  even  when  the  poset  map  induced  by  the  morphism  is  not  itself  surjective. 

1.1  Morphisms 

Notation  reminder:  We  frequently  will  be  working  with  two  relations:  R  is  a  relation  on 
XR  x  Yr  and  Q  is  a  relation  on  X ®  x  Y® .  In  order  to  distinguish  rows  and  columns  between 
the  two,  we  also  use  notation  of  the  form  XR,  YR,  Xy  ,  and  Yx  ■ 

Now  recall  the  definition  of  morphism  from  page  70: 

Definition  32  (Morphism).  Let  R  be  a  relation  on  XR  x  YR  and  let  Q  be  a  relation  on 
XQ  x  Y®.  A  morphism  of  relations  /  :  R  —>  Q  is  a  pair  of  set  functions: 

fx  :  XR^X® 
fY  :  YR^YQ 

such  that  (fx(x),  fy{y))  G  Q  whenever  (x,y)  G  R. 

Throughout  this  appendix,  'morphism'  refers  to  Definition  32.  When  the  time  comes,  we 
will  refer  to  'G-morphism'  explicitly  (see  again  Definitions  35  and  38  on  pages  75  and  76). 

Morphism  Equality:  Before  proving  properties  about  morphisms,  we  should  give  a  notion 
of  morphism  equality.  Suppose  g,h  :  R  — >  Q  are  two  morphisms.  We  will  say  that  g  =  h  if 
and  only  if  {gx(x),gy(y))  =  (hx(x),hy(y)')  for  all  (x,y)  G  R.  In  particular,  we  do  not  care 
what  the  constituent  set  maps  do  on  elements  that  are  not  relevant  to  the  relations  viewed 
as  sets  of  pairs.  (Note:  The  condition  stated  is  equivalent  to  requiring  gx{x)  =  hx{x)  and 
gv(y)  =  hY(y)  for  all  (x,y)  G  R.) 

The  following  lemma  shows  that  the  component  maps  of  a  morphism  between  relations 
may  be  viewed  as  simplicial  maps: 

Lemma  33  (Induced  Simplicial  Maps).  A  morphism  f  :  R  — ►  Q  between  nonvoid  relations 
induces  simplicial  maps  between  the  Dowker  complexes: 

fx  ■  ^r  — ►  'kg 

fy  '■ 

Proof.  We  need  to  show  that  fx(p)  G  for  all  o  G 
If  a  =  0,  then  fx(p)  =  0  G  since  Q  is  nonvoid. 

If  o'  =  {ah,  •  •  -,xk},  then  fx{<x)  =  {fx(x  i),  •  •  -,fx(xk)}. 

Since  0  /  a  G  there  exits  y  G  YR  such  that  (x,y)  G  R  for  all  x  G  a.  Thus 
(fx(x),  fy{y))  G  Q  for  all  x  G  a,  by  the  definition  of  morphism.  So  fy(y)  is  a  witness 
for  fx (o)  in  Q,  telling  us  fx{cr)  G  'S’q. 

The  argument  for  the  map  fy  :  Tr  — >  is  similar.  □ 
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Comment:  The  nonvoid  requirement  is  an  artifact,  arising  because  we  sometimes  regard 
void  relations  as  having  empty  rather  than  void  Dowker  complexes,  in  the  context  of  links 
(see  Definition  7  on  page  27,  Definition  18  on  page  43,  the  comments  about  void  relations  on 
page  85,  and  the  hypotheses  of  Lemma  54  on  page  90).  The  nonvoid  requirement  of  Lemma  33 
avoids  having  to  worry  about  mapping  from  an  artificially  empty  complex  into  a  void  one. 

Lemma  34  (Morphism  Properties).  Assume  the  notation  from  above  and  that  all  relevant 
relations  are  nonvoid.  Let  f  :  R  — >  Q  be  a  morphism  of  relations.  Then: 

(l)  fx  and  fy  are  one-to-one  set  maps  ==>■  f  is  one-to-one  f  is  a  monomorphism. 

(ii)  f  surjective  ==>  /  epimorphism  <= =>  fx  and  fy  are  surjective  set  maps. 

(Additional  conditions  for  that  last  :  The  direction  assumes  that  Q  has  no  blank 
rows  or  columns,  while  the  <==  direction  assumes  that  R  has  no  blank  rows  or  columns.) 
The  two  uni- directional  implications  =>  above  are  strict. 

(in)  If  fx  ■  'Pr  —>  4 'q  is  surjective  and  Q  has  no  blank  rows,  then  fx  ■  XR  — >  X®  is 
surjective. 

Similarly  for  fy,  now  assuming  that  Q  has  no  blank  columns. 

The  converses  need  not  hold.  Indeed,  f  itself  can  be  surjective  but  the  maps  of  simplicial 
complexes  need  not  be  (as  we  saw  with  the  maps  of  page  73  and  as  one  can  see  with 
simpler  examples  as  well). 

(iv)  If  fx  ■  XR  — >  X®  is  one-to-one,  then  fx  ■  ^r  —■ ►  'kg  is  injective.  The  converse  holds  if 
R  has  no  blank  rows. 

Similarly  for  fy,  now  assuming  that  R  has  no  blank  columns  for  the  converse. 

Proof.  We  will  prove  the  various  implications.  Strictness,  i.e.,  failure  of  converses,  where 
mentioned  above,  can  be  seen  readily  with  simple  examples. 

Part  (i): 

(a)  Let  fx  and  fy  be  one-to-one  set  maps. 

Suppose  {fx{x'),fy{y'))  =  (fx(x),  fy (y)).  Then  fx(x')  =  fx(x),  so  x’  =  x. 

And  fy(y')  =  fy(y),  so  if  =  y.  So  /  is  one-to-one  as  a  set  map  of  pairs. 

(b)  Let  /  be  one-to-one  as  a  set  map  of  pairs. 

Suppose  g,h  :  S  — >  R  are  morphisms  such  that  /  o  g  =  /  o  h. 

Suppose  (x,y)  G  S.  By  assumption,  (fx(gx(x)),  fy (gy (y)))  =  (fx(hx(x)),  fy (hy(y))). 
Since  /  is  one-to-one,  (gx(x),gY(y))  =  (hx(x),hY(y)). 

So  g  =  h,  by  our  notion  of  equality,  meaning  /  is  a  mononrorphism. 

(c)  Let  /  be  a  mononrorphism. 

Suppose  f(x,y)  =  f(x',y')  but  (x,y)  /  (. x',y ').  Let  S  be  the  relation  consisting  of  the 
single  element  {(/,«)},  with  /  and  a  new  symbols: 
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Define  two  morphisms  g,h  :  S  — >  R  by: 

gx  •  7  i — x  hx  :  I  i— >  x' 

(]Y  :cn—>y  hy  :  a  y' 

Then  g  /  h,  but  /  o  g  =  f  o  h,  a  contradiction.  So  /  is  one-to-one. 

Part  (ii): 

(a)  Let  /  be  surjective  as  a  set  map  of  pairs. 

Suppose  g,h  :  Q  — >  S  are  morphisms  such  that  g  o  f  =  ho  f. 

Suppose  ( x',y ')  G  Q. 

By  surjectivity,  there  exists  (. x,y )  G  R  such  that  (fx(x),  fY(y))  =  ( x',y ').  So: 
{gx{x'),gy{y'))  =  (, gx(fx(x)),gY(fY(y )))  =  (hx(fx(x)),hY(fY(y)))  =  {hx(x'),hY(y')). 
Thus  g  =  h  and  we  see  that  /  is  an  epimorphism. 

(b)  Assume  Q  has  no  blank  rows  or  columns  and  let  /  be  an  epimorphism. 

Suppose  fY  is  not  surjective,  so  there  exists  y*  G  Y Q  \  ( fy{YR )). 

Let  S  be  the  relation  consisting  of  two  elements  {(/,  a),  (/,  /?)},  with  I,  a ,  /3  new  symbols: 


5 

a  (3 

I 

•  • 

Define  two  morphisms  g,h  :  Q  — ►  S  by: 

gx{x)  =  I  hx(x)  =  I  for  every  x  G  X ® 

gY(y )  =  a  hY(y)  =  a  for  every  y  G  YQ  \  {y*} 

gr{y*)  =  a  hY(y*)  =  (3 

Since  y*  G  Y®  and  Q  has  no  blank  columns  there  is  at  least  one  x*  G  X®  such  that 
(■ x*,y *)  G  Q.  So  g  /  h. 

Observe  that  g  o  f  =  h  o  f  since  y*  does  not  appear  in  the  image  of  fY,  contradicting  / 
being  an  epimorphism. 

The  argument  showing  that  fx  is  surjective  is  similar. 

(c)  Assume  R  has  no  blank  rows  or  columns  and  let  fx  and  fY  be  surjective. 

Suppose  g,h  :  Q  — >  S  are  morphisms  such  that  g  o  f  =  ho  f. 

Suppose  (x,y)  G  Q.  We  need  to  show  that  gx(x)  =  hx{x)  and  gY(y)  =  hY(y),  as  that 
means  g  =  h,  given  our  definition  of  equality.  We  will  make  the  argument  for  the  X  coordinate; 
the  Y  argument  is  similar. 

Since  fx  is  surjective,  there  exists  x  G  XR  such  that  fx(x )  =  x.  Since  R  has  no  blank 
rows,  there  exists  y  G  YR  such  that  (x,  y)  G  R. 

Since  (go  f)(x,y)  =  (ho  f)(x,y),  one  obtains  gx(x)  =  gx(fx(x ))  =  hx(fx(x))  =  hx(x). 
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Part  (iii): 

Suppose  Q  has  no  blank  rows  and  suppose  fx  ■  r  —■ >  'I 'q  is  surjective  as  a  simplicial  map. 

Suppose  x  G  X®.  Since  Q  has  no  blank  rows,  x  is  a  vertex  of  'Lq,  so  there  is  some  x  that 
is  a  vertex  of  'k/j  such  that  fx(x)  =  x.  That  says  fx  ■  XR  — >  X ®  is  also  surjective  as  a  set 
map. 

The  argument  for  fy  assuming  Q  has  no  blank  columns  is  similar. 

Part  (iv): 

(a)  Let  fx  be  one-to-one  as  a  set  map  XR  —>  X®.  Consider  fx  as  a  simplicial  map  ^r  —>  'Lq. 
Suppose  fx{&)  =  k  =  fx(x),  with  <7,  r  E  \k r  and  k  E  'kQ. 

If  k  =  0,  then  necessarily  a  =  r  =  0.  Otherwise,  a  /  0  and  r  /  0,  so  let  x  G  a.  Then 
fx(x)  E  R-  So  there  exists  x'  G  r  such  that  fx(x')  =  fx(x).  Since  fx  is  one-to-one  as  a  set 
map,  that  says  x'  =  x.  Thus  a  C  t.  A  similar  argument  shows  the  reverse  inclusion,  so  a  =  t. 
Thus  fx  is  injective  as  a  simplicial  map. 

(b)  Assume  R  has  no  blank  rows  and  let  fx  be  injective  as  a  simplicial  map  — >  'kg. 

Consider  fx  as  a  set  map  XR  —>  X®  and  suppose  fx(x)  =  fx{x').  Since  R  has  no  blank 
rows,  both  x  and  x'  are  vertices  in  *k r.  That  means  fx({x})  =  fx{{x'})  when  we  view  fx  as 
a  simplicial  map,  so  {x}  =  {x'}  by  injectivity,  i.e.,  x  =  x' .  So  we  see  that  fx  is  one-to-one  as 
a  set  map. 

A  similar  argument  holds  for  the  assertions  regarding  fy.  □ 

1.2  G-Morphisms 

Recall  the  material  of  Section  14.4. 

Lemma  36  (Containment).  Let  f  :  R  — ►  Q  be  a  morphism  of  nonvoid  relations.  Then: 

(a)  (, fyof>R)(a )  C  (</>qo/x)(ct),  for  every  a  e  ^ r, 

(b)  (fxoipR)^)  C  (ipQofy)^),  for  every  7  G  <k r. 

Proof.  Observe  that  (fy  o  <fo)(0)  =  fy(YR )  C  YQ  =  0q(0)  =  (<f)Q  o  /x)(0). 

Now  let  0  /  ex  E  \kfl.  Let  y  G  <$>r{<j)  /  0.  Then  (x,y)  G  R  for  every  x  G  a.  Thus 

(fx(x),  fy(y))  E  Q  for  every  x  G  a.  So  fy(y)  G  </>q(/x(<t)).  This  is  true  for  all  y  G  <fo(cr), 

telling  us  fY((/)R(a))  C  </>, q(/x( o’))- 

The  argument  for  assertion  (b)  is  similar.  □ 

Corollary  37  (Homotopic  Face  Maps).  Let  f  :  R  —>  Q  be  a  morphism  of  nonvoid  relations. 
Then: 

(a)  fx  and  ipQ  o  fy  o  cpR  are  homotopic  poset  maps  ?('k/j)  — >  5r('kg), 

(b)  fy  and  4>q  o  fx  o  ijjR  are  homotopic  poset  maps  $(&r)  — ►  5(<kg). 
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Proof.  Let  a  £ 

By  Lemma  36,  (fy  o  (j>R)(a)  C  o  fx)(a). 

Therefore  (i/jq  o  fY  o  c/>R)(a)  D  (V'Q  °  </>Q  °  fx)(o). 

So  (V'q  o  fy  o  (f)R)  and  (ipQ  o  fq  o  fx )  are  homotopic  maps  (see  [1],  Theorem  10.11). 

Since  i(q  o  <fq  is  honrotopic  to  the  identity  on  5(\Hq)  ,  part  (a)  follows. 

The  proof  of  (b)  is  similar.  □ 


Corollary  39  (Honrotopic  Poset  Maps).  Let  f  :  R  —>  Q  be  a  morphism  of  nonvoid  relations. 
The  induced  G-morphisms  fx,  fy  :  PR  —>  Pq  are  homotopic.  (See  Definition  38  on  page  76.) 

Proof.  See  Figure  49  on  page  75  for  the  underlying  maps  comprising  the  G-morphisms.  The 
G-morphisms  are  defined  as  follows: 

For  all  (a, 7)  £  Pr: 

fx(cr,  7)  =  (o',  l1),  with  o'  =  (V’Q  0  fr  °  <\>r)(°)  and  7'  =  (j)Q{o '). 

fy(ai  7)  =  with  l"  =  (^Q  0  fx  0  VirXt)  and  o"  =  V’qCt'O- 

These  definitions  make  sense  because  fx  and  fy  map  nonempty  simplices  to  nonempty 
simplices  and  because  the  images  of  i(q  and  4>q  may  be  viewed  as  lying  in  Pq,  by  Corollary  45 
on  page  87.  (Similarly,  the  images  of  ipR,  and  <f>R  may  be  viewed  as  lying  in  Pr  —  In  fact,  as 
used  above,  these  maps  are  simply  switching  between  the  a  and  7  components  (“labels”)  of 
the  given  element  (a,  7)  in  Pr.)  Observe  that  fx  and  fy  are  order-preserving  poset  maps. 

Applying  Lemma  36  and  since  (a,  7)  £  Pr: 

(fv  o  <t>R)(cr)  C  (4>Qofx)(a)  =  (4>q  o  fx  0^X7)  =  l" ■ 

Consequently: 

o'  =  (i>Q  o  fy  o  <f>R)((T)  D  iPq( 7")  =  a". 

So  the  maps  are  honrotopic  (see  [1],  Theorem  10.11).  □ 

1.3  Lattice  Generators 

We  turn  now  to  the  main  result. 

(Recall  that  a  relation  is  tight  when  it  has  no  blank  rows  or  columns.) 

Lemma  92  (Generators  in  Inrage).  Let  f  :  R  — >  Q  be  a  surjective  morphism  between  nonvoid 
tight  relations. 

Suppose  q  £  Pq  is  of  the  form  (Xy  ,  ( (f>Q  o  V’q)({j/}))>  for  some  y  £  Y® . 

Then  there  exist  qi, ...  ,qk  in  the  image  of  fy  :  Pr  —>  Pq,  with  k  >  1,  such  that  q  =  ViLi  Qi- 
(Here  \J  is  the  join  operation  of  Pq.) 
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Proof.  By  Lemma  34(iii),  the  component  functions  fx  '■  XR  —■ *  X®  and  fy  ■  YR  — ►  Y®  are 
surjective.  Since  fy  is  surjective,  /y1({y})  =  {yi,  ■  ■  ■  ,Vk}  C  YR ,  for  some  k  >  1. 

For  each  i  =  1 , ,k,  observe  and  define  the  following: 

•  Since  R  has  no  blank  columns,  XR  0,  so  (xR:  (<t>R  o  V’i?) ({?/*}))  e  Pr- 

•  Define  cr*  as  the  “(/-component”  of  fx(XR,  (</>r  o  ,  meaning: 

oi  =  Oq  °  fv  °  4>r)(Xr)  =  ipQi'y)  =  P|  x9,  with  7  =  fy  {{f>R  o  Y>r)({^})). 

2/G7 

•  Observe  that  y  £  7  since  y  =  fy(yi )  and  £  (0#  o  V’J?)({yi})-  Therefore  (7  C  X^. 

•  Define  qt  =  (cq,  7 j)  £  Pq,  with  7,  =  ^(of).  So  %  is  in  the  image  of  /(,  :  Pr  ->  Pq. 

We  need  to  show  that  g  =  Vi=i  Qi ■  Expanding,  we  see: 

k  ,  k  k  n 

v  ®  =  (  Pq  °  <^q)  ( u  p) ,  pi  7* )  • 

i=l  ^  i=l  i=l  ' 

By  a  bullet  above,  (J-=]  (7  C  X(p,  so: 

k  k 

IJo-j  c  ('(/’Q  0  (fro)  (  IJ  o'*)  C  (ifQo(j)Q)(xQ)  =  X®. 

i=l  i=  1 

We  will  establish  X(p  C  |jf=1  <7,  thereby  completing  the  proof. 

Let  x  £  Xy  . 

So  (x,  y)  £  Q.  By  surjectivity,  there  exists  (x,y)  £  R  such  that  fx(x)  =  x  and  fy('ij)  =  y. 
Now  y  =  yj,  for  some  j  £  {1, . . . ,  k},  as  defined  earlier.  Thus  x  £  XR. 

Consequently,  for  every  z  £  (<j>R  o  ipR)({yj}),  (x,  z)  £  R  and  so  ( fx(x),fy(z ))  £  Q. 

That  means  (x,y)  £  Q  for  every  y  £  fy((fR  o  V’flXO/j}))- 
Therefore,  x  £  07  C  (J^=1  ot  and  we  conclude  that  Xy  C  Ui=l  ai- 

(Note:  Xy  need  not  lie  in  a  single  07,  since  j  depends  on  x.)  □ 

Corollary  93.  Assume  the  hypotheses  of  Lemma  92. 

Suppose  further  that  for  some  yi  £  f^dy}),  {<t>R  0  ^R)dVi})  =  {Vi}- 
Then  q  is  itself  in  the  image  of  fx  :  Pr  — >  Pq  . 

Proof.  In  the  proof  Lemma  92,  we  see  that  now  fy  (((f) r  o  ^R)({Vi}))  =  { V }>  SO  Oi  =  Xy  .  □ 
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Comment:  Corollary  93  helps  to  explain  the  example  of  pages  73  and  77,  in  which  a 
surjective  morphism  generated  the  entire  image  poset  even  though  the  induced  maps  on  the 
Dowker  complexes  were  not  surjective.  Namely: 

In  the  Mobius  relation  M  of  page  72,  singletons  are  unmoved  by  the  closure  operators.  In 
the  tetrahedral  relation  T  of  page  45,  maximal  simplices  are  dual  to  singletons.  Intersections 
of  all  the  maximal  simplices  in  the  tetrahedral  relation  generate  all  of  Pt ■  These  maximal 
simplices  come  from  dualizing  images  of  singletons  of  the  Mobius  relation. 

More  specifically:  Even  though  one  merely  has  /.y({1,2,5})  =  {1,4},  it  is  also  true  that 
fx({  1,2, 5},  {a})  =  ({1,  3, 4},  {a}).  The  G-morphisnr  f9x  therefore  supplies  the  apparently 
missing  simplex  {1,3,4}. 

More  generally,  the  following  theorem  describes  the  process: 

Theorem  40  (Lattice  Surjectivity).  Let  R  and,  Q  be  tight  nonvoid  relations.  Suppose 
f  :  R  — >  Q  is  a  surjective  morphism.  For  any  q  £  Pq: 

*-  AV  qji ,  with  each  qji  in  the  image  of  fy  :  Pr  —>  Pq, 
j  i 

«-  VA  with  each  q'kt  in  the  image  of  fy  :  Pr  — >  Pq. 

k  t 

Proof.  Write  q  =  (a,  7)  £  Pq.  Then  a  =  V’q(7)  =  fl^  Xy  ■ 

So  q  =  f\yeyqy,  with  each  qy  £  Pq  of  the  form  (Xy  ,  ( 4>q  o  V,q)({j/}))  • 

By  Lemma  92,  for  each  y  £  7,  we  have  that  qy  =  \/i  qVii  with  each  qy ^  in  the  image  of 
fx  :  Pr  — ►  Pq  and  with  i  in  some  finite  index  set  I(y).  Thus: 

Q  =  A  V  «!/.*■ 

3/S 7  iel(y) 

The  other  form  follows  by  dualizing  the  previous  arguments.  □ 
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J  A  Few  More  Examples 


J.l  Local  Spheres  versus  Global  Contractibility 


The  reader  may  wonder  whether  privacy  preservation  always  requires  a  relation  to  exhibit 
homology  in  its  Dowker  complexes.  The  answer  is  that  links  of  individuals  must  have  homology, 
by  Theorems  9  and  10  on  pages  28  and  29,  but  the  overall  relation  need  not. 


Figure  57:  Relation  D  and  its  Dowker  complex  <J>£).  The  complex  is  a  triangulation  of  the  Dunce 
Hat,  a  contractible  space  (the  seemingly  bounding  edges  actually  touch,  as  suggested  by  the 
vertex  labels).  The  Dunce  Hat  has  no  free  faces,  indicating  that  D  preserves  attribute  privacy. 
(Vertices  of  4>£)  are  attributes.  Triangles  are  labeled  with  their  generating  individuals.) 


Consider  for  example  the  relation  D  of  Figure  57.  There  are  17  individuals,  each  with 
three  attributes.  The  figure  also  shows  <Lj>  We  can  see  that  there  are  no  free  faces,  so  the 
relation  preserves  attribute  privacy  by  Lemma  61  on  page  94.  Moreover,  each  link  Lk(\k£),  x)  is 
homotopic  to  a  circle  S1.  Indeed,  viewed  from  attribute  space,  that  link  is  exactly  the  boundary 
of  a  triangle  for  each  individual.  Figure  58  shows  such  a  link  for  individual  #10.  The  link 
relation  has  a  large  number  of  individuals  but  only  three  attributes.  So  Theorem  9  holds  and 
there  is  homology  in  the  link.  There  is  however  no  homology  in  the  attribute  complex  of  the 
relation  D  itself;  the  simplicial  complex  $£>  is  a  triangulation  of  the  Dunce  Hat,  a  nontrivially 
contractible  space. 

Although  R  preserves  attribute  privacy,  it  does  not  preserve  association  privacy.  Individuals 
1  and  12  share  exactly  one  attribute  (namely  h),  but  do  so  with  4  additional  people  (namely  3, 
7,  11,  and  13).  If  attributes  represent  shared  dinners,  then  in  some  cases  one  can  infer  all  the 
guests  at  a  dinner  after  having  seen  as  few  as  two  guests.  (Attribute  privacy  means  that  one 
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Figure  58:  Relation  Q  represents  Lk(\k£),10)  from  Figure  57.  Also  shown  is  the  attribute 
Dowker  complex  $<5.  It  is  the  boundary  of  a  triangle,  so  homotopic  to  S1  =  8fc~2.  Since 
individual  #10  has  three  attributes  and  1  =  3  —  2,  that  means  relation  R  preserves  attribute 
privacy  for  individual  #10.  (Vertices  of  4>q  are  attributes.  Edges  are  labeled  with  their 
generating  individuals.  Notice  that  the  edge  {b,  c}  is  generated  by  two  individuals.  Whereas 
most  edges  in  are  shared  by  only  two  triangles,  edge  {b,  c}  is  shared  by  three  triangles;  it  is 
one  of  those  edges  that  are  glued  to  two  others  in  the  Dunce  Hat  representation.  —  Individuals 
who  generate  just  vertices  are  not  shown  in  <I>q.) 


cannot  infer  additional  dinners  attended  by  a  guest  simply  from  having  observed  that  guest  at 
a  particular  dinner  or  two.) 

3.2  Disinformation 

Privacy  loss  is  possible  when  there  is  a  free  face  in  the  relevant  Dowker  complex.  One  way  to 
preserve  privacy  is  to  eliminate  such  free  faces.  Earlier  in  the  report,  we  studied  morphisms 
between  relations  as  a  possible  way  to  transform  data  so  as  to  reduce  privacy  loss.  Ideally,  for 
attribute  privacy,  the  goal  of  such  a  transformation  is  to  map  onto  a  relation  whose  attribute 
complex  has  no  free  faces.  We  saw  that  such  transformations  need  not  always  exist,  for 
topological  reasons,  unless  one  is  willing  to  introduce  discontinuities,  that  is,  discard  knowledge 
of  some  relationships  in  the  underlying  spaces. 

Alternatively,  one  could  imagine  embedding  a  relation  within  another  that  does  preserve 
privacy.  Of  course,  at  the  extreme,  one  simply  embeds  the  given  relation  in  a  huge  relation 
that  looks  like  a  perfect  sphere.  Now  there  is  privacy  but  the  same  mechanism  that  provides 
privacy  reduces  utility.  Nonetheless,  one  has  not  discarded  relationships,  merely  surrounded 
them  with  disinformation.  We  saw  an  example  of  that  early  on,  when  we  added  a  single 
attribute  to  relation  H  in  the  example  of  Section  3.1,  in  order  to  remove  the  inference  that 
someone  had  cancer.  If  one  has  a  separate  mechanism  for  discerning  fake  entries  from  legitimate 
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Disinform  ati  on 


entries,  then  one  can  see  past  the  disinformation  —  in  the  earlier  example  that  would  entail 
having  a  (presumably  safely  encrypted)  memory  of  which  single  entry  in  the  relation  is  false. 


M  a  b  c  d  e 


o 


M 


Figure  59:  Relation  M  revisited  along  with  its  attribute  complex  &m- 


Figure  59  revisits  ours  earlier  Mobius  strip  relation,  showing  the  relation  M  and  its  attribute 
complex  &m-  Privacy  loss  occurs  when  someone  observes  two  attributes  that  make  up  an  edge 
on  the  boundary  of  the  Mobius  strip,  such  as  {b,d}.  Given  the  relation,  the  observer  can 
infer  a  third  attribute  and  identify  the  underlying  individual,  in  this  case  infer  attribute  c  and 
identify  individual  #3. 

In  order  to  preserve  attribute  privacy,  one  might  consider  adding  decoy  individuals  whose 
so-called  attributes  include  those  edges,  making  them  non-free,  thus  removing  that  inference 
mechanism.  Figure  60  does  so  by  doubling  the  number  of  individuals. 
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Figure  60:  Relation  MM  adds  five  decoy  individuals.  The  attribute  complex  &  mm  entails 
gluing  two  Mobius  strips  together. 

The  additional  five  individuals  form  their  own  Mobius  strip.  The  figure  therefore  describes 
the  overall  attribute  complex  ^ mm  as  two  Mobius  strips,  with  edges  shared  between  the  two 
strips,  as  suggested  by  the  vertex  labels.  The  overall  attribute  complex  amounts  to  gluing  the 
two  Mobius  strips  together,  boundary  to  zigzag  spine.  The  resulting  attribute  complex  is  the 
2-skeleton  of  the  full  complex  on  the  attribute  set  {a,  b,c,d,  e}.  It  therefore  is  homotopic  to 
the  wedge  of  four  two-spheres:  §2  V  §2  V  §2  V  §2. 

Each  of  edges  is  now  shared  by  three  triangles.  There  are  no  free  faces. 

There  is  no  attribute  inference.  (There  is  association  inference.) 
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Moreover,  the  complex  is  sufficiently  isotropic  that  one  cannot  say  a  priori  which  individuals 
are  real  and  which  are  decoys,  even  if  one  knows  that  there  might  be  decoys.  Of  course,  the 
actual  curator  of  the  relation  would  need  some  secure  mechanism  to  separate  truth  from  fiction, 
that  is,  to  peel  apart  the  gluing.  Regardless,  real  individuals  may  be  identified  upon  seeing  all 
their  attributes  (and  only  then). 

J.3  Insufficient  Representation 

In  this  subsection  we  show  that  if  there  are  fewer  than  2k  individuals  in  a  relation  that 
has  2k  attributes  representing  k  bits,  then  the  relation  cannot  preserve  attribute  privacy  for 
everyone.  The  basic  reason  is  that  fewer  than  2k  individuals  amounts  to  cutting  out  a  piece 
of  the  full  attribute  sphere  §0*§°*---*§0,  exposing  some  free  faces  in  By  similar 

intuition,  it  is  not  necessarily  true  that  privacy  loss  will  occur  if  there  are  fewer  than,  say,  3k 
individuals  in  a  relation  that  has  3k  attributes  representing  k  trivalent  pieces  of  information. 
After  all,  bits  are  a  special  case  of  such  tri-values,  so  one  can  preserve  privacy  with  certain 
2k  individuals.  Thinking  topologically,  the  full  attribute  space  for  such  tri-values  looks  like 
(S°  V  S°)  *  (§°  V  §°)  *  •  •  •  *  (S°  V  S°).  Cutting  out  a  piece  of  that  space  does  not  necessarily 
create  free  faces,  as  one  can  see  by  simple  example. 

Definition  94  (Binary  Attribute  Pair).  By  a  binary  attribute  pair  we  mean  two  mutually 
exclusive  attributes,  written  y  and  y.  No  individual  can  have  both  attributes.  Moreover,  in 
what  follows  we  will  assume  that  every  individual  has  exactly  one  attribute  from  any  such  pair. 

Lemma  95  (Privacy  Requires  Many  Individuals).  Suppose  Y  =  {yi,yi,y2,V2,  ■  ■  ■  >?/&>!/&}; 
with  {yt,  y,}  a  binary  attribute  pair,  for  i  =  1, . . .  ,k,  and  k  >  1. 

Let  R  be  a  relation  on  X  x  Y ,  with  X  /  0  such  that  every  individual  x  £  X  has  as  attributes 
exactly  one  of  {yi,Tii},  for  each  i  =  1, . . . ,  k.  Let  n  be  the  number  of  distinct  rows  of  R. 

Then  R  preserves  attribute  privacy  if  and  only  if  n  =  2k. 

Proof.  Observe  that  each  row  of  R  has  exactly  k  nonblank  entries,  so  each  maximal  simplex 
of  n  consists  of  exactly  k  vertices.  Moreover,  no  row  of  R  is  contained  in  another  row  unless 
the  two  rows  are  identical.  We  may  therefore  assume,  without  loss  of  generality,  that  all  rows 
of  R  are  distinct  and  incomparable.  Consequently,  every  x  £  X  is  uniquely  identifiable.  We 
can  think  of  each  individual  x  £  X  as  defining  a  unique  fc-bit  number,  with  one  bit  per  binary 
attribute  pair,  as  determined  by  that  individual’s  row,  Yx.  All  possible  /c-bit  numbers  are 
represented  by  X  if  and  only  if  n  =  2k. 

I.  Suppose  that  n  =  2k. 

Showing  that  contains  no  free  faces  would  establish  that  R  preserves  attribute  privacy, 
by  Lemma  61  on  page  94.  To  show  that  contains  no  free  faces,  it  is  enough  to  show  that, 
for  every  maximal  7  £  and  every  y  £  7,  the  simplex  7  \  {y}  is  contained  in  some  maximal 
simplex  of  other  than  just  7. 

Write  %  =  7  \  {y}.  Since  y  is  a  binary  attribute  pair,  we  can  construct  a  new  set  7'  from  7 
by  replacing  y  with  its  “opposite”.  Specifically:  7'  =  x  U  {yi},  if  y  =  yp,  and  7'  =  xU  {Vi},  if 
y  =  yi.  Since  n  =  2k  there  is  an  x  £  X  for  which  Yx  =  7'.  So  7'  £  telling  us  x  is  not  free. 
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A  Structural  Inference  Example  :  Passengers  on  Ferries 


II.  Suppose  that  that  R  preserves  attribute  privacy. 

By  Lemma  62  on  page  94,  contains  no  free  faces. 

Let  7  be  a  maximal  simplex  of  and  y  G  7.  Define  7  =  7  \  {y}.  Construct  7'  as  in  part 
I  above.  Consider  the  collection  T  =  {77  £  |  7  C  7}.  The  only  possible  set  that  might  be 

in  T  besides  7  is  7'.  Since  contains  no  free  faces,  V  =  {7, 7'}. 

Now  vary  y  across  7  and  then  repeat  the  process  for  all  7'  thus  constructed.  The  transitive 
closure  of  this  process  generates  2k  distinct  maximal  simplices  in  each  of  which  corresponds 
to  a  unique  x  G  X.  So  n  =  2k.  □ 


J.4  A  Structural  Inference  Example  :  Passengers  on  Ferries 
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Figure  61:  This  time  series  represents  17  different  passengers  on  12  different  ferry  crossings. 
Each  dot  represents  a  passenger  on  a  crossing.  As  a  visual  aid,  solid  lines  connect  multiple 
crossings  by  the  same  passenger  at  consecutive  departure  times,  while  dashed  lines  connect 
multiple  crossings  by  the  same  passenger  at  non-consecutive  departure  times. 


Imagine  a  commuter  ferry  that  crosses  back  and  forth  between  downtown  and  an  island. 
Passengers  pay  electronically  as  they  enter  the  ferry,  so  there  is  a  record  of  who  is  on  which 
crossing.  Figure  61  shows  a  hypothetical  time  series  for  12  crossings  during  a  day  in  which 
17  passengers  took  the  ferry,  some  of  whom  crossed  several  times.  Figure  62  shows  the 
corresponding  complex:  vertices  are  individuals;  each  triangle  represents  a  particular 
crossing.  (Each  ferry  crossing  had  three  passengers  in  this  simplified  example.) 

The  waitress  in  the  ferry’s  coffee  shop  observes  four  individuals  ordering  coffee  and 
conversing  during  the  day,  appearing  in  pairs  on  different  crossings.  She  observes  exactly 
four  pairs,  never  the  same  pair  twice.  Who  are  the  individuals? 

It  is  convenient  to  represent  the  waitress’s  observations  as  a  complex  itself.  Figure  63  does 
so.  Vertices  are  now  the  four  unknown  individuals;  edges  are  their  (unknown)  common  crossing 
times.  One  can  interpret  who  the  individuals  are  by  embedding  the  complex  of  Figure  63  into 
the  complex  of  Figure  62,  using  one-to-one  maps  in  both  the  passenger  and  time  domains. 
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Figure  62:  Simplicial  complex  'Pfl  determined  by  the  time  series  of  Figure  61,  viewed  as 
a  relation  R.  Vertices  represent  passengers,  labeled  with  letters.  Triangles  represent  ferry 
crossings,  labeled  with  departure  times. 


a  passenger 


a  ferry  crossing 

Figure  63:  A  waitress’s  observations  of  passengers  drinking  coffee  together  at  various  times, 
represented  by  a  simplicial  complex.  Vertices  represent  unknown  but  distinct  passengers. 
Edges  represent  unknown  but  distinct  crossing  times. 


There  are  exactly  two  such  embeddings  (modulo  index  permutations),  given  by  the  two  ways 
one  can  wrap  a  rectangle  around  the  two  holes  in  the  complex  of  Figure  62.  (Those  are  the 
only  two  “diamonds”  touching  four  different  crossing  times.)  Thus  the  individuals  are  either 
{C,G,J,K}  or  {B,  F,  I,  J} ,  as  indicated  by  Figure  64.  Either  way,  one  knows  for  sure  that 
individual  “J”  twice  had  a  conversation  over  coffee  that  day. 


Figure  64:  The  two  possible  embeddings  of  the  complex  of  Fig.  63  into  the  complex  of  Fig.  62. 


Moreover,  each  of  these  embeddings  places  a  time  ordering  on  the  embedded  edges,  from 
which  one  can  make  inferences  as  to  who  might  have  transmitted  information  to  whom.  For 
instance,  for  the  embedding  involving  individuals  {B,  F,  I,  J},  one  sees  that  individual  “J”  could 
have  been  both  the  initial  source  and  final  recipient  of  information. 
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